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Abstract 

To estimate the emission parameters in hidden Markov models one commonly uses the 
EM algorithm or its variation. Our primary motivation, however, is the Philips speech 
recognition system wherein the EM algorithm is replaced by the Viterbi training algorithm. 
Viterbi training is faster and computationally less involved than EM, but it is also biased and 
need not even be consistent. We propose an alternative to the Viterbi training - adjusted 
Viterbi training - that has the same order of computational complexity as Viterbi training 
but gives more accurate estimators. Elsewhere, we studied the adjusted Viterbi training 
for a special case of mixtures, supporting the theory by simulations. This paper proves the 
adjusted Viterbi training to be also possible for more general hidden Markov models. 

Keywords: Consistency; EM algorithm; hidden Markov models; parameter estimation; Viterbi 
Training 



1 Introduction 

We consider a set of procedures to estimate the emission parameters of a finite state hid- 
den Markov model given observations xi, . . . , x„. Thus, y is a Markov chain with (finite) 
state space 5", transition matrix P = (pij), and initial distribution vr. To every state / G S 
there corresponds an emission distribution Pi with density fi that is known up to the 
parametrization fi{x;9i). When Y reaches state /, an observation according to Pi and 
independent of everything else, is emitted. 



1 



The standard method for finding the maximum hkehhood estimator of the emission 
parameters 9i is the EM-algorithm that in the present context is also known as the 
Baum- Welch or forward-backward algorithm [H [2l [8], O [181 [ISj- Since the EM-algorithm 
can in practice be slow and computationally expensive, one seeks reasonable alterna- 
tives. One such alternative is Viterbi training (VT). VT is used in speech recognition 
[SI [13 [13 [201 [211 [22], natural language modeling [12], image analysis [2], bioinformat- 
ics [21 [IZ]- We are also motivated by connections with constrained vector quantization 
[H [6] . The basic idea behind VT is to replace the computationally costly expectation (E) 
step of the EM-algorithm by an appropriate maximization step with fewer and simpler 
computations. In speech recognition, essentially the same training procedure was already 
described by L. Rabiner et al. in [T0l[20] (see also [HI [19]). Rabiner considered this proce- 
dure as a variation of the Lloyd algorithm used in vector quantization, referring to Viterbi 
training as the segmential K-means training. The analogy with the vector quantization is 
especially pronounced when the underlying chain is simply a sequence of i.i.d. variables, 
observations on which are consequently an i.i.d. sample from a mixture distribution. For 
such mixture models, VT was also described by R. Gray et al. in [1], where the training 
algorithm was considered in the vector quantization context under the name of entropy 
constrained vector quantization (ECVQ). 

The VT algorithm for estimation of the emission parameters of the hidden Markov model 
can be described as follows. Using some initial values for the parameters, find a realization 
of Y that maximizes the likelihood of the given observations. Such an n-tuple of states is 
called a Viterbi alignment. Every Viterbi alignment partitions the sample into subsam- 
ples corresponding to the states appearing in the alignment. A subsample corresponding 
to state / is regarded as an i.i.d. sample from Pi and is used to find /t/, the maximum 
likelihood estimate of 6i. These estimates are then used to find an alignment in the next 
step of the training, and so on. It can be shown that in general this procedure converges 
in finitely many steps; also, it is usually much faster than the EM-algorithm. 

Although VT is computationally feasible and converges fast, it has a significant dis- 
advantage: The obtained estimators need not be (local) maximum likelihood estimators; 
moreover, they are generally biased and inconsistent. (VT does not necessarily increase 
the likelihood, it is, however, an ascent algorithm maximizing a certain other objective 
function.) Despite this deficiency, speech recognition experiments do not show any signif- 
icant degradation of the recognition performance when the EM algorithm is replaced by 
VT. There appears no other explanation of this phenomena but the "curse of complexity" 
of the very speech recognition system based on HMM. 

This paper considers VT largely outside the speech recognition context. We regard the 
VT procedure merely as a parameter estimation method, and we address the following 
question: Is it possible to adjust VT in such a way that the adjusted training still has 
the attractive properties of VT (fast convergence and computational feasibility) and that 
the estimators are, at the same time, "more accurate" than those of the baseline proce- 
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dure? In particular, we focus on a special property of the EM algorithm that VT lacks. 
This property ensures that the true parameters are asymptotically a fixed point of the 
algorithm. In other words, for a sufficiently large sample, the EM algorithm "recognizes" 
the true parameters and does not change them much. VT does not have this property; 
even when the initial parameters are correct (and n is arbitrarily large), an iteration of 
the training procedure would in general disturb them. We thus attempt to modify VT in 
order to make the true parameters an asymptotic fixed point of the resulting algorithm. 
In accomplishing this task it is crucial to understand the asymptotic behavior of P", the 
empirical measures corresponding to the subsamples obtained from the alignment. These 
measures depend on the set of parameters used by the alignment, and in order for the 
true parameters to be asymptotically fixed by (adjusted) VT, the following must hold: 
If P]^ is obtained by the alignment with the true parameters, and n is sufficiently large, 
then fii, the estimator obtained from P", must be close to the true parameters. The latter 
would hold if 

P," ^ Pu a.s. (1) 

and if the estimators jli were continuou£] at Pi with respect to the convergence in ([T]). The 
reason why VT does not enjoy the desired fixed point property is, however, different and is 
that ([1]) need not in generally hold. Hence, in order to improve VT in the aforementioned 
sense, one needs to study the asymptotics of the measures P". First of all, one needs to 
know if there exist any limiting probability measures Qi such that for every / G S 

Pp ^ Q;, leS a.s.. (2) 

If such limiting measures exist, then under the above continuity assumption, the estima- 
tors fii will converge to fii, where 

fii = argmax / \n fi{x;6i)Qi{dx). 
J 

Taking now into account the difference between fii and the true parameter, the appropri- 
ate adjustment of VT, so called adjusted Viterbi training (VA) can be defined ( §2.2p . 

Let us briefiy introduce the main ideas of the paper. Let X stand for the observable 
subprocess of our HMM. The core of the problem is that the alignment is not defined for 
infinite sequences of observations, hence the asymptotic behavior of P" is not straight- 
forward. To handle this, we introduce the notion of barrier (^. Roughly, a barrier 
is a block of observations from a predefined cylinder set that has the following prop- 
erty: Alignments for contiguous subsequences of observations enclosed by barriers can be 
performed independently of the observations outside these enclosing barriers. A simple 
example of a barrier is an observation z that determines, or indicates, the underlying 
state: Xu = z ^ yu = I, u < n. This happens if z can only be emitted from /. This also 
implies that any Viterbi alignment has to go through / at time u, and in particular, the 
alignment up to u does not depend on the observations after time u. If a realization had 
many such special 2's, then the alignment could be obtained piecewise, gluing together 

^Loosely speaking, the requirement is that fii is consistent. 
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subalignments each for each segment enclosed by two consecutive z's. 

Barriers are a generahzation of this concept. A barrier is characterized by containing 
a special observation termed a node (of order r > 0). Suppose a barrier is observed with 
Xu being its node. The node guarantees the existence of state I such that any alignment 
goes through / at time u independently of the observations outside the barrier. 

Lemma 13.11 states (under certain assumptions) the existence of a special path, or a block, 
of Y states such that, first, the path itself occurs with a positive probability, and second, 
the (conditional) probability of it emitting a barrier is positive. Hence, by ergodicity of 
the full HMM process, almost every sequence of observations has infinitely many bar- 
riers emitted from this special block. Next, we introduce random times r^'s at which 
such nodes are emitted. Note that r^'s are unobservable: We do observe the barriers but 
without knowing whether or not the underlying MC is going through that special block 
at the same time. It is, however, not difficult to see that the times Tj = r, — rj„i are 
renewal times, and furthermore, the process X is regenerative with respect to the times 
Tj (Proposition 14.21) . 

Recall that almost every sequence of observations has infinitely many barriers and that 
every barrier contains a node. For a generic such sequence, let Ui be the times of its nodes. 
Note that Ui-s are observable and that also every for all j = 1,2,..., tj = Ui for some i > j 
(there may be more nodes than those emitted from the special block). Using these Uj's as 
dividers, we define infinite alignment piecewise (Definition l4.1l) . Formally we have defined 
a mapping v : S°°, where is the set of all possible observation sequences, and 

S°° is the set of all possible state-sequences. Hence, V = v{X) is a well defined alignment 
process. We consider the two-dimensional process Z := {X,V), and we note that this 
process is also regenerative with respect to r^'s. We now define empirical measures 
that are based on the first n elements of Z (Definition 14.21) . Using the regenerativity, it 
is not hard to show that there exists a limit measure Qi such that =^ Qi, a.s. and 
Pp =^ Qi (Theorem 14. 4p . This is the main result of the paper. 

To implement VA in practice, a closed form of Qi (or fii) as a function of the true pa- 
rameters is necessary. The measures Qi depend on both the transition and the emission 
parameters, and computing Qi can be very difficult. However, in the special case of mix- 
ture models, the measures Qi are easier to find. In [12], VA is described for the mixture 
case. The simulations in [121 [H] verify that VA indeed recovers the asymptotic fixed 
point property. Also, since the appropriate adjustment function does not depend on the 
data, each iteration of VA enjoys the same order of computational complexity (in terms of 
the sample size) as the baseline VT. Moreover, for commonly used mixtures, such as, for 
example mixtures of multivariate normal distributions with unknown means and known 
covariances, the adjustment function is available in a closed form (requiring integration 
with the mixture densities). Depending on the dimension of the emission, the number of 
components, and on the available computational resources, one can vary the accuracy of 
the adjustment. We reiterate that, unlike the computations of the EM algorithm, com- 
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putations of our adjustment do not involve evaluation and subsequent summation of the 
mixture density at every data point. Also, instead of calculating the measures Qi exactly, 
one can easily simulate them producing in effect a stochastic version of VA. Although 
simulations do require extra computations, the overall complexity of the stochastically 
adjusted VT can still be considerably lower than that of EM, but this, of course, requires 
further investigation. 

2 Adjusted Viterbi training 

In this section, we define the adjusted Viterbi training and we state the main question of 
the paper. We begin with the formal definition of the model. 

2.1 The model 

Let y be a Markov chain with finite state space 5* = {!,..., K}. We assume that Y is 
irreducible and aperiodic with transition matrix P = {pij) and initial distribution vr that 
is also the stationary distribution of Y . We consider a hidden Markov model (HMM), 
in which to every state / G S there corresponds an emission distribution Pi on [X,B). 
We assume X and E are a separable metric space and the corresponding Borel cr-algebra, 
respectively. Let fi be a density function of Pi with respect to a certain dominating 
measure A on {X,E). Two most important concrete examples are (M°', i3) with Lebesgue 
measure and discrete spaces with the counting measure. We define support of Pi as the 
interesection of all closed sets of probability 1 under P;, and denote such supports by G/. 

In our model, to any realization yi,y2, ■ ■ ■ of Y there corresponds a sequence of inde- 
pendent random variables, Xi, X2, ■ ■ where X„ has the distribution Py^. We do not 
know the realizations ?/.„ (the Markov chain Y is hidden), as we only observe the process 
X = Xi, X2, . . ., OT, more formally: 

Definition 2.1 We say that the stochastic process X is a hidden Markov model if there 
is a (measurable) function h such that for each n, 

Xn = h{Yn,en), whcrc 61,62, . . . are i.i.d. and independent ofY. (3) 

Hence, the emission distribution Pi is the distribution of h{l,en)- The distribution of X 
is completely determined by the chain parameters (P, vr) and the emission distributions 
Pi, I & S. Moreover, the processes Y and X have the following properties: 

• given Yn, the observation X„ is independent of Y^, m ^ n. Thus, the conditional 
distribution of X„ given Yi, 1^2 • • • depends on Yn only; 

• the conditional distribution of X„ given Yn depends only on the state of Yn and not 
on n; 

• given Yi, . . . ,Yn, the random variables Xi, . . . , Xn are independent. 
The process X is also mixing and, therefore, ergodic. 
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2.2 Viterbi alignment and training 

Suppose we observe xi,...,Xn, the first n elements of X. Throughout the paper, we 
will also use the shorter notation A central concept of the paper is the Viterbi 

alignment, which is any sequence of states qi,,,n G S"' that maximizes the likelihood of 
observing In other words, the Viterbi alignment is a maximum-likelihood estimate 

of the realization of Yi, . . . , y„ given xi, . . . ,Xn- In the following, the Viterbi alignment 
will be referred to as the alignment. We start with the formal definition of the alignment. 
First note that for any sequence G S"" of states and sets Bi E B i = 1, . . . ,n, 

n „ 

p(Xi G Si, . . . e = gi, . . . = g„) = p(Fi = gi, . . . ,y„ = g„) TT / ^dA, 

i=i 

and define A(gi, . . . , xi, . . . , x„) to be the likelihood function: 

n 

A(gi„,n; xi,„n) = P{Yi = qi, i = 1, . . . ,n)Y\_ fqX^i)- 

1=1 

Definition 2.2 For each n > 1, let the set of all the alignments be defined as follows: 

V(xi...„) = {ve 5" : Ww e 5" Aiv; xi„.„) > A(u;; Xi„,„)}. (4) 

Any map v : A"" i-^ V(xi....„) as well as any element v G V(xi, . . . ,Xn) will also be called 
an alignment. 

Note that alignments require the knowledge of all the parameters of X: {it, P) and Pi 
V/ e S. 

Throughout the paper we assume that the sample xi...,^ is generated by an HMM with 
transition parameters (7r,P) and with the emission distributions fi{x;6l), where 6* = 
{61, ... , 9*p^) are the unknown true parameters. We assume that the transition parameters 
P and vr are known, but the emission densities are known only up to the parametrization 
fi{-',Oi), 9i e 0;. A straightforward generalization to the case when ip = (P, ^*), all of the 
free parameters, are unknown, can be found in [13]. In the present case, the likelihood 
function A as well as the set of alignments V can be viewed as a function of 6. In the 
following, we shall write Ve for the set of alignments using the parameters 6. Also, unless 
explicitly specified, ve G Ve will denote an arbitrary element of Ve- 

The classical method for computing MLE of 6* is the EM algorithm. However, if the 
dimension of X is high, n is big and /^'s are complex, then EM can be (and often is) 
computationally involved. For this reason, a shortcut, the so-called Viterbi training is 
used. The Viterbi training replaces the computationally expensive expectation (E-)step 
by an appropriate maximization step that is based on the alignment, and is generally 
computationally cheaper in practice than the expectation. We now describe the Viterbi 
training in the HMM case. 
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Viterbi training 



1. Choose an initial value 9° = {91, . . . ,9°j^). 

2. Given 9^ , obtain alignment 

and partition the sample Xi, . . . ,Xn into K sub-samples, where the observation 
belongs to the Z*'^ subsample if and only if = I. Equivalently, we define (at most) 
K empirical measures 

PnA;9^,x,...^)'^^^^i^^^P^, AeB, I e S. (5) 

3. For every sub- sample find MLE given by: 

fii''{9^xi,„n) = argmax / \n fi{9i,x)Pp{dx;9\xi.„n), (6) 
di^Qi J 

and take 

9i^' = l2i{9\xi...n), leS. 

If for some I & S Vi ^ I for any i = 1, . . . ,n (/*^ subsample is empty), then the 
empirical measure P" is formally undefined, in which case we take 9i+^ = 91 We 
will be omitting this exceptional case from now on. 

The Viterbi training can be interpreted as follows. Suppose that at some step j, 9^ = 9* 
and hence Vgj is obtained using the true parameters. The training is then based on the 
assumption that the alignment fi...n = v{xi,,,n) is correct, i.e., Vi = Yi, i = 1, . . . ,n. In this 
case, the empirical measures Pp, I G S would be obtained from the i.i.d. sample generated 
from Pi{9*), and the MLE /i;"(6'*, Xi ,„) would be a natural estimator to use. Clearly, 
under these assumptions Pp{9* , Xi,,,n) =^ Pi{9*) a.s. ("^" denotes the weak convergence 
of probability measures) and, provided that {fi{-;9) : 9 G O/} is a P;-Glivenko-Cantelli 
class and 0; is equipped with some suitable metric, limn^oo fi']^{9* , Xi = 9^ a.s. Hence, 
if n is sufficiently large, then ^ Pi and 

9l^' = ijJl{9\xi...n)^9*i=9l V/ 

i.e. 9^ = 9* would be (approximately) a fixed point of the training algorithm. 

A weak point of the foregoing argument is that the alignment in general is not correct 
even when the parameters used to find it, are. So, generally Vi ^ Yi. In particular, this 
implies that the empirical measures Pp{9*, xi,„n) are not obtained from an i.i.d. sample 
from Pi{9*). Hence, we have no reason to believe that P"(^*, Xi,..„) ^ Pi{9*) a.s. and 
liiRn^oo fi'f{9* , Xi ri) = 9i a.s. Moreover, we do not even know whether the sequences of 
empirical measures {P"(6'*, Xi...„)} and MLE estimators {/i"(6'*, Xi.,.„)} converge (a.s.) 
at all. 
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In this paper, we prove the existence of probabihty measures Qi{9,9*) (that depend on 
both 6, the parameters used to obtain the ahgnments, as well as 6*, the true parameters 
used to generate the training samples), such that for every / G S, 

Pl'{e\x^,„r,)^Qi{e\e*), a.s. (7) 

for a special choice of the alignment vq* G Ve* used to define Pp{9*, xi,„n)- (In fact, adding 
certain mild restrictions on Pi, one can eliminate the dependence of the above result on 
the particular choice of the alignment vg* G Ve*.) We will also be writing Qi{9) for Qi{6, 9) 
whenever appropriate. 

Suppose also that the parameter space 0^ is equipped with some metric. Then, under 
certain consistency assumptions on classes J^i = {/;(■; Oi) : 6i G 6;}, the convergence 

lim fiiie\Xi,„^)=fXii9*) a.s. (8) 

n— »oo 

can be deduced from ([7]), where 




We also show that in general, for the baseline Viterbi training Qi{9*) ^ Pi{6*), implying 
fii{9*) 7^ 9i. In an attempt to reduce the bias 9i — fii{9*), we next propose the adjusted 
Viterbi training. 

Suppose ([7]) and ([S]) hold. Based on ([H]), we now consider the mapping 

9^fii{9), l = l,...,K, (10) 

The calculation of fii{9) can be rather involved and it may have no closed form. Nonethe- 
less, since this function is independent of the sample, we can define the following correction 
for the bias: 

Ai{9) = 9i-fii{9), l = l,...,K. (11) 
Thus, the adjusted Viterbi training emerges as follows: 

Adjusted Viterbi training 

1. Choose an initial value ^° = (e?, ...,9%). 

2. Given 9\ perform the alignment and define K empirical measures Pp{9^,9*) as in 
©• 

3. For every P"(6'-^, Xi. ...„), find fi^{9^ , Xi,„n) as in (Q. 

4. For each /, define 

9l^' = fifi9^,x,.„n) + M9^), 

where A; as in flTT]) . 
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Note that, as desired, for a sufficiently large n, the adjusted training algorithm has 6* as its 
(approximately) fixed point: Indeed, suppose 9^ = 9*, then /i"(6'-^, = /i"(6'*, 

Recalling ([8]), it then follows that /i"(6'*, ^ yU/(6'*) = fJ'i{9^), for all / G S. Hence, 

9l^' = fii{9*,xi...n) + M9*) ^ fii{9*) + Ai{9*) = 9; = 9\ I e S. (12) 

In [12], we considered i.i.d. sequence Xi, X2, . . ., where Xi has a mixture distribution, 
i.e. the density of Xi is '^i^iPifi- Here pi > are the mixture weights. Such a se- 
quence is an HMM with the transition matrix satisfying pij = pj In this partic- 
ular case, the alignment and the measures Qi are easy to find. Indeed, for any set of 
parameters 9 = {9i, . . . 9k), the alignment vg can be obtained via a Voronoi partition 
5(e) = {5i(^),...,^^(e)}, where 

Sm = {x:pMx-9^)>p,f^{x-9,), Vjg5} (13) 
Sm = {x--Pifi{x\0{)>p,f,{x-9,), Vje5}\(5iU...U5,_i), / = 2,...,K. (14) 

Now, the alignment can be defined point-wise as follows: f6)(xi, . . . , Xn) = vg^xi) ■ ■ -ve^Xn), 
where ve{x) = I if and only if x G Si{9). 

The convergence (JTj) now follows immediately from the strong law of large numbers as 
Pp{9*,Xi,„n) Qi{9*) a.s., where 

qi{x- 9*) oc f{x- 9*)Isue') = C^P^f^^^'^ l = l,...,K 

i 

are the densities of respective Qi{9*). 

Thus, in the special case of mixtures, the adjustments A; are easy to calculate and the 
adjusted Viterbi training is easy to implement. Simulations in [12] have largely supported 
the expected gain in estimation accuracy due to the adjustment A with a small extra 
cost for computing A. Indeed, this extra computation does not affect the algorithm's 
overall computational complexity as a function of the sample size, since A depends on 
the training sample only through 9\ the current value of the parameter. 

Due to the time- dependence in the general HMM, the convergence (I7j) does not follow 
immediately from the law of large numbers. However, the very concept of the adjusted 
Viterbi training is based on the existence of the (^/-measures. Thus, in order to generalize 
this concept to an arbitrary HMM, one has to begin with the existence of the Qi-measmes, 
which is the objective of this paper. 

3 Nodes and barriers 

In this section, we present some preliminaries that will allow us to prove the convergences 
([7]) and ([8]). We choose to introduce the necessary concepts gradually, building up the 
general notions on special cases that we find more intuitive and insightful. For a compre- 
hensive introduction to HMM's and related topics we refer to [HI [HI HI], and an overview 
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of the basic concepts related to HMM's follows below in §3.1[ We then proceed to the 
notion of infinite (Viterbi) alignment ( §4.21) . developing on the way several auxiliary no- 
tions such as nodes and barriers. 

Throughout the rest of this section, we will be writing fi and V for fi{-;6i), the true 
emission distributions, and Ve*, the set of alignments with the true parameters, respec- 
tively. 

3.1 Nodes 

3.1.1 Preliminaries 

Let 1 < Ui < U2 < ■ ■ ■ < Uk < n. Given any sequence a = (ai, . . . , a„), write a„i...u^ for 
(a„^, . . . , a^j.) and define also the following objects: 

Si\-Xi^) = {veS-:v^,...^, = {h,...Jk)}- 

Next, given observations xi„,n, let us introduce the set of constrained likelihood maximiz- 
ers defined below: 

Next, define the scores 

Su{l)= max A{q;xi,„u), (15) 

and notice the trivial case: = iiifi^Xi). Then, we have the following recursion (see, 
for example, [T9]): 

5u+i{j) = raax{Suil)pij)fjixu+i). (16) 

l^S 

The Viterbi training as well as the Viterbi alignment inherit their names from the Viterbi 
algorithm, which is a dynamic programming algorithm for finding v G V(a;i..,„). In fact, 
due to potential non-uniqueness of such v, the Viterbi algorithm requires a selection rule 
as part of its specification. However, for our purposes we will often be manipulating by 
V(xi...„) as opposed to by individual w's, in which case we will also be identifying the 
entire V(xi..,„) with the output of the algorithm. This algorithm is based on recursion 
(fT6|) and on the following relations: 

t{u,j) = {leS:WieS6u{l)pij>Su{i)Pij}, u=l,...,n-l, (17) 
V(xi...„) = {v e S"" : 6n{Vn) > Sn{i) Vz G 5, t^^ G t{u, V^+l) 1 < u < u} . (18) 

It can also be shown that 

^^L{xi...n) = {vE Slin) : G t{u, v^+i) u = 1, . . . ,n - 1}. (19) 

We shall also need the following notation: 

V^V:i(^l-n) = ^ V(Xi...„) : = ih, . . . ,/fc)}. 
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and will use subscript (/) to refer to alignments obtained using {pii)i<zs (instead of tt) as 
the initial distribution. Thus V(;)(xi...„) stands for the set of all such alignments, and 

"^(oi\'.'i(^i-") = {ve V(z)(xi.„„) : = (^1, • • -Jk)}- 

Similarly, VV.,/^ "'* (xi,,,n) will be referring to the constrained alignments obtained using 
{pii)ies as the initial distribution. The following Proposition and Corollary reveal more 
structure of the alignments. 

Proposition 3.1 Let 1 < u < n, then 

>V^(xi,..„) = W^(xi.„„) X V(o(x„+i...„), (20) 

Viix,...n) ^ ^ Vlix^...n) = WliXi...n). (21) 

Proof. The Markov property implies: for any q = (gi, . . . , g^). 

A(g; Xi.,,„) = A(gi Xi,..„) ■ A(g„_|_i,..„; Xu+i...n\Qu), 

where 

n 

^(?itH-l...nj 2;„+l.,,„|/) = P(^+l...n = Q'u+l...n|^ = fqX-^i)- 

i=u+l 

Thus, f l20|) follows from the equivalence between maximizing A(g;xi...„) over «S'^(ri) on 
one hand, and maximizing A(gi Xi...^) and A(g„+i a;„,..„|/) over 5''^-" and S[^{n), 
respectively and independently, on the other. fl2T]) follows immediately from the definitions 
of the involved sets. ■ 

Corollary 3.1 

V^(xi,..„) ^ and V^(xi...„) ^ ^ V^(xi,..„) = V^(xi,..J x V(z)(x,+i,..0. (22) 

Proof. The hypotheses of (!22|) together with (121]) imply V^(xi...„) = Wu(xi...n) and 
'^ui^i.-.u) = VV^(xi,..„). The latter statements and (120|) yield the claim. ■ 

3.1.2 Nodes and alignment 

We aim at extending the notion of alignment for infinite HMM's. In order to fulfil this 
objective, we investigate properties of finite alignments (e.g. Propositions 13.11 and 13.21) 
and identify necessary ingredients (e.g. "node", and "barrier") for the development of the 
extended theory. We start with the notion of nodes: 

Definition 3.1 For 1 < u < n, we call x„ an l-node if 

Su{l)pij > Su{i)Pij, e S. (23) 

We also say that Xu is a node if it is an l-node for some / G S*. 
Figure [T] illustrates the newly introduced notion. 
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Figure 1: An example of the Viterbi algorithm in action. The solid line corresponds to the final 
alignment vi,„n- The dashed links are of the form (A;, /) — (A; + 1, j) with I G t{k,j) and are not 
part of the final alignment. E.g., (1,3) - (2,2) - (3,3) is because 3 G t(l,2), 2 € t{2,3). The 
observation Xu is a 2-node, since we have 2 G t{u,j) Vj S 5. We also see that is fixed. 



Proposition 3.2 



Xu is an l-node <^=^ / G t{u,j) Vj G S', (24) 

V^(xi...„) ^ 0, (25) 

^ \/v G V(xi„,0,Vt;* G Vlixi...u) iv*,Vu+i...n) e Vi(xi„,„), (26) 

=^ V^(xi...„) 7^ 0, (27) 

^ Right hand side of ([22]). (28) 



Whether Xu is a node does not depend on Xi, i > u. 

Proof. The final statement follows immediately from Definition 13.11 and ( IT5|) . and (12^ 
also follows immediately from Definition 13.11 and 0171) . Summing both sides of fl23p over 
j G 5", we obtain 

Su{l)>Su{^), \/^eS, (29) 

hence, fl2^ holds by f[T5]) . Note that fl2B]) means that any alignment i; G V(a;i...„) can be 
modified by setting Vu = I and taking v* G t(2, Wj+i) for i = m — 1, m — 2, . . . , 1, and the 
modified string remains an alignment, i.e. belongs to V(xi...„). Such a modification is 
evidently always possible, i.e., {v* ,Vu+i...n) is well-defined since V^(xi,,.u) 7^ 0. For u = n 
this holds trivially, for m < n this follows from (!24|) (as the latter implies / G t(M,ftj+i) 
for any value of w„+i), and ([18]). Also, ([26]) implies Finally, given (|25|) and (§71), 

Corollary ED yields ([2H]). ■ 

Remark 3.2 Note that a modifi,cation of v G V{xi,,,xn) possibly required to enforce Vu = I 
when Xu is an l-node (see proof of (!26|) above) depends only on xi, . . . ,Xu-i- Thus, if Xu 
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is an l-node and if v* G Vl^{xi,,,x^), then for any n> u and any Xu+i-, . . . ,Xn fl26|) always 
guarantees an alignment v G V(xi....„) with vi^ ^ = v* , in which case we can call v* fixed, 
meaning that v* can be kept as the substring of the first u components for any alignment 
based on the extended observations. 

The fact tliat v G V(xi...n) in general does not imply vi,,,u G V{xi,,,u) complicates the 
structure of the alignments and furthermore emphasizes the significance of nodes in view 
of (EHD and Remark [O 

Corollary 3.2 Suppose the observations such that for some 1 < Ui < U2 < 

■ ■ ■ < Uk < n, the observations Xu^ are U-nodes, i = 1, . . . , /c — 1. Then 

0^V^/-\(xi...„) = 

= V^\(X1.„„J X V(^^/4(X„,+1„.„,) X ■ • ■ X V(;^_^J^^(X,,_,+1,..„J X Vii,){Xu,+l...n). (30) 

Proof. By ([25]), 

Vi(xi.„„J^0, z = l,...,/c. 

By (J27D 

Vi(xi...„)^0, Vi(xi.,.„,^J^0 z = l,...,A;-l. 
From fl26l) . it now follows 

Now use ([22D to decompose 

Vi(a:i...n) = Vl\{xi...u,) X V(i,)(x„,+i...„). 
Use (122|) again to decompose 

Proceeding this way, we obtain (!30l) . ■ 

Corollary 13.21 guarantees the existence of an alignment that can be constructed 

piecewise, i.e. 

(fi, . . . , Wfc+i) G V(a;i...n), (31) 

where 

^1 e V^\(xi...„J, W2 e V(^^)'„^^(x„,+i,..„J, . . . ,1;^ e V(;^_^)^';(x„,„^+i...„J,Ufc+i g V(;,)(x„,+i,„„J. 

3.1.3 Proper alignment 

If the sets Vj-;, ^j'^, (a;u^_-^+i. ..„.), z = 2, . . . , A; as well as V(ij.)(a;„^+i,,,„) have a single element 
each, then the concatenation (131 p is unique. Otherwise, a single Vi will need to be selected 
from V^i^_^^^^{xu,_,+i...u,)- Thus, suppose that (x„^_^+i,„„J = (x„^._j+i...„J, and = /j 
for some j ^ i. Ignoring the fact that the actual probability of such realizations may 
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well be zero in most cases, for technical reasons we are nonetheless going to be general 
and require that the selection from any V(^^-^^^^{Xu+i..m+a) for which x„ and Xu+a are 
q and / nodes, respectively, be made independently of u. To achieve this, we impose 
the following (formally even more restrictive) condition on admissible selection schemes 
{w'iKx^,„m) : A'" ^ W(^)^(xi...„,), m = l,...,n, qJeSy. 

Vg, V/ e 5, Vm < n, Vxi,„„ E A"" : Wi,„n = w'^\xi,„n) =^ Wi...™, = (32) 

The condition ([22D above simply states that the ties are broken consistently. 

Definition 3.3 The alignment fl3T|) based on li,. . . ,1^ nodes Xu^, ■ ■ ■ ,Xu,. is called proper 
if for every i = 2, . . . , k — 1 

where {w^'(xi.,,m) : VV(g)^(xi...m), m = 1, . . . ,n, q,l E S} is some selection scheme 

satisfying fl32l) . 

Clearly, there may be many such selection schemes and the following discussion is valid 
for all of them (provided the choice is fixed throughout). One such selection scheme is 
based on taking maxima under the reverse lexicographic order on S"^ (for any positive 
integer m). According to this order -<, for a,b E S*"*, a -< 6 if and only if for some i, 
1 < i < m, ai < hi and aj = bj for j = i + 1, . . . ,m. (Clearly, if neither a -< b nor b -< a, 
then aj = 6^ for j = 1, . . . , m, in which case a and b are defined equal for this order.) It 
is immediate to verify that fl32l) holds for 

w'^^{xi,„r,i) =^ max^ >V(g)^(xi...m), 1 < m < n, q,l E S. (33) 

For the sake of concreteness, we are going to refer to this particular selection scheme as 
the selection and base all proper alignments on it. Also, since Definition 13.31 does not 
concern the initial or terminal components of the concatenated alignment flHT]) . we extend 
the selection (again, purely for the sake of concreteness of the presentation) to the initial 
and terminal components of the concatenated alignment (!3T]) . Thus, to specify the initial 

component we have w'^'(xi..,m) niax^ W^(xi, ,.„,), 1 < m < n, for all / G S" and for 
all vr, probability mass functions on S. To be concise, we will write VW for the selected 
element of W for any W C (where W generally depends on Xi.,,™,). In particular, the 
final component is then specified via VV(;)(xi,..m)- 

Example 3.4 Consider an i.i.d. sequence Xi,X2, . . ., where Xi has a mixture distribu- 
tion, i.e. the density of Xi is Yld=iPifi- Here Pi > are the mixture weights. Such a 
sequence is an HMM with the transition matrix satisfying Pij = pj In this case, an 

observation x^ is an l-node if 

In particular, this means that every observation is an l-node for some I E S . Then ( 1161) 
becomes 

6u+i{i) = max{6u{j))pifi{xu+i) oc pifi{xu+i), Vi 
j 
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and 

Su{l)>Su{i), Vz Pifi{xu) >Pifi{xu), Vi (34) 

Thus, in a mixture-model, any observation node, more precisely it is an l-node for 

any I = argmaxj {pjfj{xu))- For this model, the alignment can naturally he concatenated 
point-wise: = (f (xi), . . . ,f where 

v{x) = argmaxpj/j(x). (35) 



The alignment will be proper if ties in ( l35l) are broken consistently, which is, for example, 
the case when using the selection ( l33l) . 



3.2 r -order nodes 

The concept of nodes is both important and rich, but the existence of a node can also be 
restrictive in the following sense: Suppose xi,„u is such that 6u{i) > for every i. In this 
case, fl23|) is equivalent to 



^uil) > max( max( — 
i \ j \pijJ 



and actually implies pij > for every j E S. Hence, one cannot guarantee the existence 
of an /-node for an arbitrary emission distribution since an ergodic P in general can have 
a zero in every row, violating the above positivity constraint on the l^^ row of P. We 
now generalize the notion of nodes in order to eliminate the aforementioned positivity 
constraint and to still enjoy the desirable properties of nodes. We need some additional 
definitions: For each u > 1 and r > 1, let 



P\y{u) = max PigJq,{Xu+l)Pg,qJg,{x^+2)Pq2q3 ■ ■ ■Pqr-iqrfqriXu+r)Pqrj- (36) 



,(0 

Also, for each m > 1 define pfj^{u) = pij, and notice 

(m) = maxpjg ~^\u)fg{xu+i)Pg~''\u + 1), for all r' = 1, 2, . . . , r. (37) 

The recursion f ll6l) then generalizes to 

6u+i{j) = maxidu-ri'fjPiMu - r))/j(x„+i), r < u. (38) 

i€S ^ J ' 

For r > 1 and u + r <n define 

t«(w, j) = {I e 6^{l)ptr'^ > 6,,{{)pt-'^}, (39) 

tW(n,J) = {tM(n,j): J G J}, J ^ S. 
It can be verified that for 1 < g, r, g + r < -n, — m 

t^^+'^\u,j)=t^'^\u,t^^{u + q,j)), (40) 
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-^u -^u+l -^M+S 




Figure 2: In this example, Xii IS Si 2 order 2-node, 3 -order 3-node. Thus, for given 

xi^__n, the ahgnment includes Vu = 2. However, unlike in the case of ordinary nodes (of order 0), 
Xu+i can now destroy the property of Xu being the (second order) node. 

where t^^\u,j) coincides with t{u,j) f|T8l) . Thus, li G t^''\u,t^^\u + q,j)) in fHOl) imphes 
the existence of I2 G t^^^u + q,j) such that li G t^'^\u, 12). In short, 

t^'^\u,&\u + q,j)) = U,,,„(,+,,,.)t(^)(n,/). 

Note that with this new notation, f|T8|) and f|T9l) can be rewritten respectively as follows: 

V(xi, ...,Xn) = {veS^: 6M > 5„(0 Vz G 5, G t("-")(n, 1 < n < n} (41) 
>V^(xi, ...,x„) = {ve Slin) : i;, G ^-'^^uj) 1 < u < n} (42) 

We now generalize the concept of the node: 

Definition 3.5 Let l<r<n,u<n — r and let I G S . We call Xu an l-node of order r 

5u{l)pt^{u)>5u{i)pt^{u), Vz,jg5. (43) 
We also say that node of order r if it is an l-node of order r for some I G 5*. 

Note that a O'^^-order node is just a node. One immediately obtains the following proper- 
ties of the (generalized) nodes: 

Proposition 3.3 Let < r, 1 < q such that r + q < n — u, then 

1. If Xu is an r^^ -order l-node, then it is also an l-node of order r + q. 

2. If Xu+q is an r^^-order l-node, then Xu is an (r + qy^''-order V -node for any I' G 

t^^^M,/). 

Next, we generalize Proposition 13. 2t 
Proposition 3.4 

Xu is an l-node of order r <^=4> / G t^^^^\u,j) Vj G S, (44) 
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u + r < n, Xu is an l-node of order r ^\fv E V(xi...„), Vf * G yVl{xi,,,u) 

3v' e wl l-^;^\{x^,„^+r+l) -.v* = ^i...., W, e Vi(xi„,„), (45) 

VUxi...n) + 0, (46) 
^ V^(xi...„) = WSxx...u) X V(o(x.+i...n). (47) 

Finding anc? i;* G W^(xi...u) m f H5l) /or g'zwen G V(a;i.,,„) (ioes not require 

knowledge of any of Xu+r+i...n- Finally, whether x„ is an l-node of order r depends on 
Xi, . . . , Xu+T only, i.e. it does not depend on any Xj for i > u + r. 

Proof. The final statement follows immediately from Definition 13.51 and relations (flSj) and 
(!36|) . (144|) also follows immediately from Definition 13.51 and (!39l) . In order to see (145|) . note 
that applying (l40l) with g = 1 to Z G t'^^"''^^(u, w„+r+i) once gives us vi G t^^'^(n + 1, Vu+r+i)- 
Applying then (HQ]) with g = 1 to G t'^''"*"'"^) (m + i, w„+r+i) successively for i = 2, . . . , r 
proves the existence of the entire vi,,,r G S*^ such that I G t{u,v[), v[ G t(M + 1,^2), • • 
Cr-i G t(^i + r — l,?)r-), ^'r ^ (^i, ^ M+r+i) • Thus, recalling ( H2l) . v = t^+i. for some 
v' G >Vi";+;|l(xi.„„+,+i). Since ^7* G t(^,<+i) for z = 1, . . . , n - 1 (i;* G >Vi(xi.„„) and 
f|T9|) ). and t'j G t(z, t'i+i) for z = n + r + 1, . . . , n — 1 and 5n(wn) > <^n.(j) Vj G 5 (f G V(xi...„) 
and ffTSl) ). one gets (w*, f', z^«+r+i...n) £ V^(a^i...n)- Evidently, f' above involves no Xi for 
i > u + r. Thus, unlike in fl26|) . in addition to setting w^j = Z and taking v* G t(z, Wj+i) for 
z = n — — 2,...,lwe may have to "realign" u + 1**, . . ., u + r*'^ components in order 
for the modified string to remain in V(xi..,„). Moreover, v* need not belong to V(xi...m). 
Clearly, ( l45l) implies ( l46l) . Finally, given ( l46i) . Proposition 13. II yields ( 1471) . ■ 

Corollary 3.3 For any fixed s & S, Proposition \3.4\ remains valid after replacing n by 
{Psi)ieS! wherever appropriate. In particular, 

u + r < n, Xu is an l-node of order r ^ 7^ "^'(s)u(^i-..«) — 

= ^{sii^i.-.u) X V(i)(a;«+i...n)- 

Corollary 3.4 Let Ui + Vi < Wj+i i = 1, . . . , /c — 1, and Uk + r^ < n, and suppose is 
such that the observations are U-nodes of order Vi, for i = 1, . . . , k. Then 

0^v^/-.\(xi...„) = 

= >V^\(X1.„„J X W(,^),;f^(x,,+i...„J X • • ■ X W^^^Jl^{Xu,_,+l...u,) X V^i,)iXu,+l...n). (48) 

Proof. By fjlBl) . we have 

Vi(xi..,„)^0 ^ = l,...,fc. 

Hence, 

0^vS;\(^i...n). 

By dSD, 

Apply Corollary 13.31 to get 
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and repeat similarly to get 



for 2 = 2, . . . , /c — 1, yielding the desired result. ■ 

Thus, the assumptions of Proposition 13.41 and Corollary 13.41 establish the existence of 
piecewise alignments 

V = {vi, . . . ,Vk+i) eV{xi,„n), (49) 
where Vi e Wi\{xi...u,), V2 G W(^^)^'^(x„,+i.„«J, . . ., Vk G W(^^_^J^^(x„,_,+i...«J, Vk+i G 

def 

V(i^){xuf.+i...n)- Moreover, for every i = 1, . . . ,k, the vectors w{i) = (fi, . . . ,Vi) satisfy 
w{i) G W^, and w(i)i...ni_i = w^i — 1), i = 2, . . . , k. Since w{i) does not depend 

on Xu^^r,+i, ■ ■ ■ ,Xn and as long as Xi, . . . , Xui+r, are such that x^^ is a node of order-rj, an 
alignment f (xi . .„) can always be found such that = w{i). 



Definition 3.6 Any alignment of the form in fj49l) is called a piecewise alignment based 
on nodes x^, ■ ■ ■ , Xu^ of orders ri, . . . , r^, respectively. 

Recall that we have previously fixed the selection scheme V fl33p . Based on this selection 
scheme, we will concern ourselves in §4.21 with proper (Definition 13.31) piecewise (Defini- 
tion [3T6]) alignments (that are based on nodes of possibly non-zero orders) formally defined 
as follows: 

Definition 3.7 

^(xi...„) = (VWj(xi...„J,VW(,^)i^^(x„,+i...„J,..., 

vV(;,)(x,,+i...„)) G vi\-X{xi...n), 

for k > 0, and v{xi,__n) == VV(xi...„) for k = 0. 

To summarize the above, recall that by defining nodes (of various orders) we aim at ex- 
tending alignments at infinitum, and we would like to do this for as wide a class of HMM's 
with irreducible and aperiodic hidden layers as possible. Requiring /-nodes of order im- 
mediately restricts the transition probabilities by requiring pij > for Vj G S. However, 
this restriction disappears with the introduction of nodes of order r for sufficiently large r. 
Indeed, suppose that Wu < u < n, we have Su{j) > Vj G (which in particular implies 
fj{xu) > Vj G S' Vm < M < n). Then, Xu being an /-node of order r, and irreducibility 
of the underlying chain, imply p[j\u) > Vj G S. The latter in turn implies that rij > 
for every j G S, where vij is the Ij^^ entry of P''. Thus, having an /-node of order r for 
some r does not impose any restriction on P: by virtue of irreducibility and aperiodicity 
of P, there always exists tq such that P has all of its entries positive for every r > tq. 
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3.3 Barriers 

By Corollary 13.41 Xu being a node of order r fixes the alignment up to u for any possible 
continuation of xi,__u+r- However, changing the value of an observation before Xu+r+i, say 
Xi or Xu+r, can prevent Xu from being the node. Moreover, in general nothing guarantees 
that for an arbitrary prefix x[ ^ G X'^, w + u-th element of {x[ ^, Xi,„u+r) would be a 
node of order r. On the other hand, a block of observations ^ G X'' [k > r) can be 
such that for any w > and for any x'l ^ G X'^, w + k — r-th element of ix[ ^,x\ is 
a node of order r. x\ ^ in that case will be called a barrier. 

Definition 3.8 A block of observations ^ G X'' (k > r) is called an l-barrier of order r 
and length k if for any w > and for any x[ ^ G X^ , w + k — r-th element of {x'l ^,x\ j^) 
is an I -node of order r. 

3.4 Existence of barriers 

In this section, we state the main technical result of the paper. For each z G S*, we denote 
by Gi = nc-cioscd, p,(G)=iG, the support of Pi. 

Definition 3.9 We define a subset C G S to be a cluster, if, simultaneously. 



(Note that C is well-defined that is, if the first condition is satisfied with one choice of 
density functions /j, it will certainly be satisfied with any other choice of densities Qi since 
X{{x E X : fi ^ (yfj}) = for all i G S'.) Hence, a cluster is a maximal subset of states 
such that the corresponding emission distributions have a "detectable" intersection of 
their supports Gc = Hi^cGi. Clusters need not necessarily be disjoint and a cluster can 
consist of a single state. In this latter case such a state is not hidden: Any emission from 
this state reveals it. If = 2, then, for an HMM, there is only one cluster (otherwise the 
underlying Markov chain would not be hidden as all observations reveal their states). In 
many cases in practise there is only one cluster, that is S. 
A proof of Lemma 13.11 below is given in Appendix 15.11 

Lemma 3.1 Assume that for each state I G S, 



Moreover, assume that there exist a cluster G C S and a finite integer m < oo such that 
the m-th power of the sub- stochastic matrix Q = {pij)ij(zc has all of its entries non-zero. 
Then, for some integers M and r,M>r>0, there exist a set B = BiX ■ ■ ■ x B^j C X^'^ , 
an M -tuple of states qi,__M G S^^ , and a state I G 5*, such that every vector y E B is an 
l-barrier of order r , Qm-t = I ond 



YmYiP^{r\,^cGir\{xeX ■ /,(x)>0})>0, and P^{ai^cGi) = Q ^J^C. 



J 




(50) 
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Lemma [3.11 implies that P((Xi, . . . ,Xm) G 5) > 0. Also, since every element of -B is a 
barrier of order r, the ergodicity of X therefore guarantees a.e. realization of X to contain 
infinitely many /-barriers of order r. Hence, a.e. realization of X also has infinitely many 
/-nodes of order r. 



3.4.1 Separated barriers 

If we were to apply Corollary 13.41 to a realization with infinitely many /-nodes of order 
r, we would first need to ensure that Mj+i > Ui + r for i = 1,2,..., where Mj's are the 
locations of the nodes. Obviously, one can easily select a subsequence of those nodes 
to enforce this condition. For certain technical reasons related to the construction of the 
infinite alignment process we, however, choose first to define special barriers for which 
the above "separation" condition is always satisfied. Then, we give a formal statement 
( Lemma 13.21 below) guaranteeing that these separated barriers occur also infinitely often. 
Let B C and M and r be as in Lemma [XT] Assume that for some / G S and some j > 
Xj...j+M-i G B, i.e. Xj,,,j+M-i is an /-barrier of order r, and Xj+M-r-i is an /-node of order 
r. However, it might happen that for some i, j < i < j + r, Xi,,,i+M~i is also in B. Then 
Xi+M-r-i is another node of order r. In this case, z + M — r — 1 — (j + A/ — r — 1) <r and 
Corollary 13. 4l can not be used (in the presence of ties) with these two nodes simultaneously. 



Definition 3.10 Let B* C such that all its elements are l-harriers of order r for 
some I G S and r < N. We say that x\ ^ G -B* is separated (relative to B*) if for any w, 
1 <w <r, and for any x[ ^ G the concatenated block {x[^^^^,x\__j^_^) ^ B* . 

Thus, roughly, a barrier is separated, if it is at least r + 1 steps apart from any preceding 
B* barrier. 

Suppose B C is such that every x\ G i? is a barrier. The barriers from B need 
not in general be separated. However, it can be possible to extend these barriers to make 
them separated relative to their own set B*. For example, suppose further that there 
exists X E X such that no y E B contains x, i.e. x'l ^ x i = 1, . . . , M. All the elements of 

B* =^ {x} X B are evidently barriers, and moreover, they are now also separated (relative 
to B*). 

This will be used in Appendix §5.21 to prove Lemma 13.21 given below, and which states 
that under the assumptions of Lemma [XT| separated barriers are also guaranteed to occur. 
In other words, a.e. realization of X has infinitely many separated barriers. 

Lemma 3.2 Suppose the assumptions of Lemma \3.1\ are satisfied. Then, for some inte- 
gers M and r, M > r > 0, there exist a set B = Bi x ■ ■ ■ x Bm C X^^ , an M -tuple of 
states qi,„M G S^^ , and a state I G S , such that every vector y E B is a separated (relative 
to B) l-barrier of order r, qu-r = I and 



P((^i 



Xm) G B 



yi = gi, . . . , Ym = qAi]> 0, P(Yi = qi, . . . , Ym = qn) > 0. 
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3.4.2 Counterexamples 



The condition on C in Lemma 13.11 might seem technical and even unnecessary. We next 
give an example of an HMM where the cluster condition is not fulfilled and no barriers 
can occur. Then, we will modify the example (Examples 13.1 2113. 13p to enforce the cluster 
condition and consequently gain barriers. 

Example 3.11 Let K = 4 and consider an ergodic Markov chain with transition matrix 



/ i i \ 



P 



1 1 

2 2 



i 



Vo 1 iy 



Let the emission distributions be such that fl50l) is satisfied and Gi = 
Gi n G3 = 0. Hence, in this case there are two disjoint clusters Ci 
The matrices Qi corresponding to Ci, i = 1,2 are 



G2 and 6*3 = 
= {1,2}, C2 



G4 and 
= {3,4}. 







Evidently, the cluster assumption of Lemma \3.1\ is not satisfied. Note also that the align- 
ment cannot change (in one step) its state to the other one of the same cluster. Due 
to the disjoint supports, any observation indicates the corresponding cluster. Hence any 
sequence of observations can be regarded as a sequence of blocks emitted from alternating 
clusters. However, the alignment inside each block stays constant. 



In order to see that no Xu can be a node (of any order) for 1 < u < n, recall t{u,j) ([T] 
and t{u, jY'^^ (HOl) . and Proposition \3.4\ Specifically, note that in this setting for any j G S 
t{u,j) contains exactly one element, hence for any r > 1, t{u,jY'''^ defines a function from 
S to S. Now, it is easy to see that depending on Xu, t{u,j) belongs to a single cluster 
C{xu) for all j G 5*. In particular, there are i,j E G' <Z S for some cluster G' such 
that i j . Given this particular transition matrix, evidently t{u,i) ^ t{u,j). Hence, x„ 
cannot be a (zero order) node (by (1441) ). Now, starting withu + 1 (instead ofu), the same 
argument establishes that for some i,j G S, t{u + 1,^) 7^ t{u + 1, j) but are in one clus- 
ter. Applying the same argument again but now to t{u + l,i) and t{u + 1, j), we get that 
t{u,t{u + l,i)) j^t{u,t{u+l,j)), i.e. t^'^\u,i) ^ t^'^\u,j). Consequently Xu cannot be a 
first order node ( 144|1 ." and so forth and so on recursively for any r such that < r < n — u. 



Example 3.12 Let us modify the HMM in Example \3.11\ to ensure the assumptions of 
Lemma \3 . l\ hold. At first, let us change the transition matrix. Let < e < | and consider 
the Markov chain Y with transition matrix 
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Let the emission distributions be as in the previous example. In this case, the cluster 
Ci satisfies the assumption of Lemma \3.1\ As previously, every observation indicates 
its cluster. Unlike in the previous example, nodes are now possible. To be concrete, let 
e = 1/4, fi{x) = exp{-x)^>o, f2{x) = 2 exp(-2a;)^>o, and f^ix) = exp(x)^<o, f^ix) = 
2exp{2x)x<o. It can then be verified that, for example, ifxi = 1, X2 = 1 then Xi is a 1-node 
of order 2. Indeed, in that case any element of B = (0, +00) x (log(2), +00) x (0, +00) is 
a 1 -barrier of order 2. 

Example 3.13 Another way to modify the HMM in Example \3 . 1 1\ to enforce the assump- 
tions of Lemma VJ.lM s to change the emission probabilities. Assume that the supports Gi, 
2 = 1, ... ,4 are such that Pj{r\f^^Gi) > for all j G S, and fISUl) holds. Now, the model 
has only one cluster that is S = {1, . . . , 4}. Since the matrix has all its entries positive, 
the conditions of Lemma \ 3.1\ are now satisfied. A barrier can now be constructed. For 
example, the following block of observations, 

^1) ^2; ^3) ^l! • • • ) Z21 ^3, (51) 

where Zi^z\ e ^^j=\Gj, i = 1,2,3, Xj G X, i = l,...k and k is sufficiently large, is a 
barrier (see proof of Lemma \3. The construction of barriers in this case is possible 
because of the observations Zi and z[. These observations can be emitted from any state 
(i.e. from any distribution Pi, i = l,...,4j and therefore do not indicate any proper 
subsets of S . They play a role of a buffer allowing a change in the alignment from a given 
state to any other state (in 3 steps). The HMM in Example \3.11\ does not have r -order 
nodes, because such buffers do not arise. The cluster assumption in Lemma \3.1\ makes 
these buffers possible. 



4 Alignment process 

Let Xioo = Xi,X2, ... be a realization of X. If for some r < cxd xi^ contains infinitely 
many r-order nodes, then Corollary 13.41 paves the way for defining an infinite alignment 
for xioo- 



4.1 Preliminaries 

Throughout this Section, we work under the assumptions of Lemma 13. 1[ Let M > 0, 
B C X'^^ ^ T > 0, and I & S, q = qi,„M ^ "S*^^ as promised by Lemma [3. 2 [ Then, for every 
n > 1, 







P (^{Yn, Yn+M-l) = q)>0, P {{Xn, . . . , X^+M-l) G . . . , Yn+M-l) =q)> 

hence every Xn...n+M~i G 5 is a separated (relative to B) /-barrier of order r. 

Denote P (^(X„, . . . , X„+a/-i) e (y„, . . . , Y^+m-i) = g) by 7*. Thus, 7* > 0, and define 

Un = {Xn, ■ ■ ■ , Xn+M-l), -D„ = (Yn, . . . , F„+M-l)- (52) 
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Let Tn = cr(Yi, . . . ,Yn, Xi, . . . , Define stopping times uq, ui, 1^2, . . ., Rq, Ri, R2, ■ ■ ■ , 
and -do, 1^1, 1^2, • • • , of the filtration {J^n+M -i}'^=i as follows: 

z/Q = mm{n > 1 : f/„ G 5, D„ = q}, z/j = mm{n > z/i„i -.Un^B, D.„ = g}; (53) 
=^ min{n > 1 : f/„ G 5}, -i^j =^ min{n > di^i : Un e B}; (54) 

-Ro == niin{?T, > 1 : Dn = q}, Ri == min{?7, > Ri-i : -D„ = q}. (55) 

We use the convention min0 = and max0 = — 1. Note the difference between u and R 
and The stopping times ^9 are observable via X process alone; the stopping times R 
are observable via the Y process alone; the sopping times u already require knowledge of 
the full two-dimensional process (X, Y). Clearly i^i < i>i, and Ri < Vi. 

From fl55l) . it follows that the random variables i?o, {Ri — Ro), {R2 ~ Ri), ■ ■ ■ are inde- 
pendent and (i?i — -Ro)) {R2 " Ri), ■ ■ ■ are identically distributed. The same evidently 
holds for the random variables uq, (z/i — uq), {1/2 — Ui), . . .. 

Proposition 4.1 For any initial distribution vr' ofY, we have E^^iVq < 00 and ET^iiyi — 
uq) < 00. 

Proposition 14.11 above is an intuitive result; a proof is provided in Appendix §5.3[ To 
every Ui, i = 0,1, . . . there corresponds an /-barrier of order r. This barrier extends over 
the interval [z/j, Ui + M — 1]. By Definition 13.81 is an /-node of order r, where 

r, = z/, + (M - 1) - r, i = 0,l,... (56) 

Define 

To = To, Ti Ti - Ti^i = Vi- Vi^i, 1 = 1,2,.... (57) 

Proposition 14.11 says that Et^iT\ < 00, Et^/Tq < 00, where tt' is any initial distribution of 
Y. Thus, Ti, i = 0,1, .. . correspond to a delayed renewal process (for a general reference 
see, for example, [7]). 

Let uq, ui, U2, ... be the locations of the order r /-nodes corresponding to the stopping 
times d, i.e. 

Ui = 'di + {M -l)-r, z = 0,1,2,.... (58) 

Clearly, every is also Uj for some j > i. Also, since the barriers are separated, Ui > 
Ui-i + r. 

4.2 Alignments 

We next specify the alignments w(xi...„) G V(xi...„) and define v{xi,„oo) as well as the 
measures corresponding to f(a;i. ..„,). 
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Let k{xi„,n) be the number of Xuq, x^, • • • , Xu^^^_^ all I nodes of order r such that 
Ui > Ui^i + r for z = 1, . . . , k{xi,,,n) — 1, and Uk(xi...„)-i + r < n. Recall (Definition 13.71) 
that based on the selection V (1331) . we single out the following proper piecewise alignment: 



v{Xi,„n) = (VW^^,(Xi...„J, VW(,)4( 

V>V(;)^^_^(x„,_,+i...„,_J, VV(o(a;„,_,+i...„)) G V^;i„^,_^(xi.„„), 

for k = k{xi,,,n) > 0, and f(a;i,,,„) = VV(xi,,,n) for k = 0. Corollary 13.41 makes it possible 
to define the infinite proper piecewise alignment that will be consistent with Definition 
13.71 (in the sense of (159|) below). Namely, we state 

Definition 4.1 

^^(Xl...oo) = (VW^„(Xi...„J,V>V(,)4(x„„+i...„J,...,) 

for all Xioo that contain infinitely many Xuq, x^^, . . . , l-nodes of order r, which is the 
case a.s. (Lemmas \3.1\ and \3.^) . (For all other realizations, let us adopt v{xi„,oo) '= 
(V>Vi,^(xi.„„J, yW^i)^^{xuo+i...m), • • • , V>V(^),',^_^(a;„,_2+i.„„,_J, 1, 1, . . .), where k is the to- 
tal number of I nodes of order r in the given realization.) 

Note that for every observed in (xi, . . . , x„) 

V{x'^)l...u, = v{Xi, . . . , Xn)l-ur (59) 

Let us now formally define the empirical measures Pp which are central to this theory: 

Definition 4.2 Let = V( „ = v{Xi, . . . ,Xn) (where v is as in Definition \3.1^ . For each 
state I G S that appears in V(, Vg, ■ ■ - V^ define the empirical l-measure 

n (^/i, Ai...„j - =^ — — , 15. 

For other I G S (i.e. such that I ^ V- for i = 1, . . . ,n), define to be an arbitrarily 
chosen (probability) measure P* . 

The infinite alignment allows us to define the alignment process: 

Definition 4.3 The encoded process V '= v{X) will be called the alignment process. 

(Of course, the definition of V above is sensible only because X has infinitely many Uj-s 
a.s..) We shall also consider the 2-dimensional process 

Z= (X, V^). 

Using Z, we define a related quantity as follows: Let Vi, . . . ,Vnhe the first n elements 
of the alignment process. In general 

V{x'^)l...n 7^ V{xi, . . .,Xn), 

hence V/ need not equal Vi. For every / G S", we define 

(As in Definition if I Vi, i = 1, . . . ,n, then Qf = P*.) 
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4.3 Regenerativity 

To prove our main theorem, we use the fact that Z is a regenerative process (for a general 
reference, see, for example, [3]): 

Proposition 4.2 The processes V, X, and Z are regenerative with respect to the sequence 
of stopping times Ti . 

A proof is given in Appendix §5.41 

Recall ( §4.ip i?, the set of separated /-barriers of order r, and the corresponding state 
sequence q. Let 

P^^P^Jb^, z = 1,...,M. 
Thus, Pq^ is the measure P^. conditioned on ^-th component of B. Recall also that 

qu-T = I- 

Define new processes 

y^ = (F;)~i, where y,- = g^f-r+i, • • • , = ^m, and F/^^, • • • (60) 

is an S -valued Markov chain with transition probability matrix P and initial 
distribution {pqj,j)j(.s\ 

=^ {XD'^i is a modified HMM with as its underlying Markov chain and 
Py if 2 > r, and PI if 1 < z < r, as its emission distributions; 

yr d£f (y.*-)^^ ^(x*-), where v is as in Definition HH (61) 

Z'' = {X\V). (62) 

Note that the process X^ is not exactly an HMM as defined in Definition 12.11 because 
the first r-emissions are generated from distributions that differ from the distributions 
of the subsequent emissions. However, conditioned on the underlying Markov Chain F^, 
all emissions are still independent. Also note that in the definition of V^*", the alignment 
is still based on the original HMM X, i.e. the definition of f (xi, . . . ,Xn) relies on the 
distributions P^^, P^j,. . . , Pg„ (given Yi,. 

.n — qi...n)- 

Finally, note that for r = 0, the process is essentially our original Markov chain 
except for the initial distribution that is now {pij)j£s instead of vr. Similarly, X° is the 
HMM in the sense of Definition 12.11 with y° as its underlying Markov chain. Therefore, 
is the process Z with {pij)j(^s as the initial distribution of its F-component. 

Finally we define analogues of and tq: 

min [n>l: (F^, . . . , Y:^,,_,) = q, {X^ . . . , X;+,,_i) G B) } 

ul + (M - 1) - r. (63) 

Note that the random variable Tq has the same law as Tj flFTl) . i > 1. Since the barriers 
from B are separated (Definition 13.101 Lemma 13.21) . then Vq > r. This means that the 
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laws of Vq, Tq, + r, and tq + r would all be equal if the initial distribution of Y were 
{PqAii)iGS- Recalling that any initial distribution n' of Y yields Et,i{uq) < oo (Proposition 
I4.ip . we obtain 

ET, = Ft', = E,,, {uo + (M - 1) - r + r) < oo. (64) 

The above observations will allow us to prove (see Appendix §5.51) the following theorem 
which is the main result of the paper. 

Theorem 4.4 // X satisfies the assumptions of Lemma \3.1\ then there exist probability 
measures Qi, I E S, such that 

Pi" Qu a.s., Qr =^ Qh as- 

and for each A E B, 

^ _ YZ.nZleAxl^r^,>^) 

QU)- YZ.^iyr = i^r^>^ ■ (65) 
where , Z"^ , and Tq are defined in fl6T]) . fl62l) . and fl63l) . respectively. 

Corollary 4.1 Suppose X satisfies the assumptions of Lemma \3. 1\ with r = 0. Then, for 
each I G S (E^i takes form 

Ej=l P«(^i =hTo >3) 
where P/ corresponds to the Y process initialized with {pij)j£s instead of n. 



5 Appendix 

5.1 Proof of Lemma 13.11 

Proof. The proof below is a rather direct construction which is, however, technically 
involved. In order to facilitate the exposition of this proof, we have divided it into 18 
short parts as outlined below: 

I. - §5.1.11 - Maximal probability transitions p* and maximal likelihood ratio A. 

II. Construction of 

( ^5X2|) auxihary subsets X, fl68|) : 
(§SX3D a special set Z c X. (1701). (I7T]); 

( §5.1.4p auxiliary sequences s (172|) . a (173|) . and b fl5.1.4p of states in S; 
( §5.1.5p k, the number of s cycles inside the s-path; 
( §5.1.6p the s-path fl75]) . a prototype of the required sequence qi,,,M'i 
( §5.1.7p the required barrier (17U]) . 

III. Proving the barrier construction (179|) : 
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( §5.1.8p a, /5, 7, ?7-notation for commonly used maximal partial likelihoods; 
i^JM) a bound (JHS]) on 

( §5.1.10p bounds (j86l) . ( !87l) . (l88l) . and (j89l) on common likelihood ratios; 
( §5.1. HI) "/j < const X 7i; 

fi jS. 1.121) Further bounds ffTOlj) . (fT05|) on likelihoods; 
( §5.1.131) ?7j < const x r/i; 
( §5.1.141) a special representation of r/i (11071) : 
( §5.1.151) an implication of (I103P and (11071) for Si{xil); 
XkL is a {kL + m + P)-order 1-node: 
(gnXIED proof 

( §5.1.17p proof of an auxiliary inequality (11140 . 
IV. ( §5.1.18p Completion of the s-path to qi,„M and conclusion. 

5.1.1 Maximal probability transitions p* and maximal likelihood ratio A. 

Let 

p* = max|poj|, i E S, and A = max max i — : p^i > o\ (67) 
jes ^ ie5 j&s ipji i 

be defined as above. 

5.1.2 Xi c X. 

It follows from the assumption (l50l) and finiteness of S that there exists an e > such 
that for all / G S* 

PiiXi) > 0, where Xi = [x e X : max{p*/,(x)} < (1 - e)pUi{x)\- (68) 

(Note that p*i > for all / G 5 by irreducibility of Y.) Also note that the sets Xi, I G S 
are disjoint and have positive reference measure X{Xi) > 0. 

5.1.3 Z C X and 5 — K bounds on cluster densities fi, i ^ C 

Let C be a cluster as in the assumptions of the Lemma with the corresponding sub- 
stochastic matrix Q. The existence of C implies the existence of a set 2 C Gc{= CinzcGi) 
and 6 > 0, such that X{Z) > 0, and Vz G Z, the following statements hold: 

(i) mmi(.cfiiz) > 6; 

(ii) maxj^cfjiz) = 0. 
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Indeed, if no Z and 5 > existed with property (i), we would have 

(niec(G'i n {2; G A" : ji{z) > 0})) = 0, contradicting the first defining property of clus- 
ter: Pj{Gc niec {x eX : fi{ x) > 0}) > (with any j E C). Now, if Z did not satisfy 
(ii), we would remove from it Z (1 Ui^c{z G X : fi{z) > 0} as this would not reduce its 
A measure. This is due to the second condition in the definition of cluster which implies 
A(Gc n{zEX : fi{z) > 0}) = for all i ^ C. 

Evidently, K > Q can be chosen sufficiently large to make \{{z G X : fi{z) > K}) 

arbitrarily small, and in particular, to guarantee that \{{z E X : fi{z) > K}) < 

Clearly then, redefining Z Z n {z E X : fi{z) < K, i E C} preserves \{Z) > 0. Next, 
consider 

X{Z\{Ui^sm- (69) 

If ( 169!) is positive, then define 

Z"^ Z\{Ui^sXi). (70) 
If ( 1691) is zero, then there must be s G C such that 

A(i nXs)>0 

and in this case, let 

z = zr] Xs. (71) 

Such s E S must clearly exist since \{Z) > but \{Z\{UiizsXi)) = 0. To see that s must 
necessarily be in the cluster C, note Vs ^ C, fs{z) = Q'iz E Z, which implies Z fl A'^ = 0. 



5.1.4 Sequences s, a, and b of states in 5 

Let us define an auxiliary sequence of states gi, q2, and so on, as follows: If (l69l) is 
zero, that is, if 2 = ^ fl X^ for some s G C, then define qi = s, otherwise let qi be an 
arbitrary state in C. Let q2 be a state with maximal probability of transition to qi, i.e.: 
Pg2qi = (see (1671) for the p* notation). Suppose q2 7^ qi- Then find q^ with Pq^q^ = Pg^- 
If gs ^ {qi,q2}, find ^4 : ^^493 = p^^, and so on. Let U be the first index such that 
Qu ^ {^i) • • • ^Qu-i}: that is, qu = qx for some T < U. This means that there exists a 
sequence of states {qr, ■ ■ ■ , qu} such that 

• qT = qu 

• qr+i = argmaxjpjgy^,_^, i = 1, . . . ,U - T. 

To simplify the notation and without loss of generality, assume qu = I- Reorder and 
rename the states as follows: 

si := qu-i, S2 := qu-2, ...,«»:= qu-i, . . . , Sl := = 1 i = 1, . . . , L = U - T, (72) 

ai := qr-i, a2 := qT-2, ■ ■ ■ ,ap := qi, P = T-1. (73) 

Hence, 

{gi, . . . , qr-i, qr, Qt+i, Qu-i, Qu} = {ap, • • • , ai, 1, sl-i, ...,Si, 1}. 
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Note that if T = 1, then P = and {qi, qu-i, Qu} = {1; sl~i, . . . , Si, 1}. We 
have thus introduced special sequences a = (oi, 02, . . . , ap) and s = (si, S2, . . . , sl-i, 1). 
Clearly, 

Ps,-^s,=pI, i = 2,...,L, pI^=p^^^ (74) 

Pa,_i ai=P*a,, 2 = 2, . . . , P, p*^ = Sl = 1. 

Next, we are going to exhibit b = (61, . . . another auxiliary sequence for some R> I, 
characterized as follows: 

(i) = 1; 

(ii) there exists bo E C such that PhobiPbib2 " " 'PbR^ib^ > 

(iii) if i? > 1, then 7^ 6j for every i = 1, . . . , R. 

Thus, the path 61, . . . connects cluster C to state 1 in i? steps. Let us also 

require that R be minimal such. Clearly such b and bo do exist due to irreducibility of 
Y. Specifically, for any two states in S in general, and for any state in C and state 1 
in particular, there exists a (finite) transition path of a positive probability. Note also 
that minimality of R guarantees (iii) (in the special case of i? = 1 it may happen that 
61 = 1 G S* and pn > 0, in which case bo can be taken to be also 1). 



5.1.5 k, the number of s cycles inside the s-path 

Let Q"^ be the m-th power of the sub-stochastic matrix Q = {pij)ij^c] let qij be the 
entries of Q"*. By the assumption, Qij > for every i,j G C. This means that for every 
i, j G C, there exists a path from i to j of length m that has a positive probability. Let q*j 
be the probability of a maximum probability path from i to j. In other words, for every 
i,j G C, there exist states Wi, . . . , Wm-i £ C such that 

PiwiPwiW2 ' ' ' Pwm-lWm-lPwm-lj ~ ^ij ^ 0. ('''5) 

Denote by q 

ming*->0. (76) 
Next, choose k sufficiently large for the following to hold: 

(l-e)''-'<q\^Y"^A-^, (77) 

where A and e are as in (1671) and (l68l) . respectively, and 6 and K are introduced in §5.1.31 

5.1.6 The s-path 

We now fix the state-sequence 

^0, bi,..., bR, si, S2, . . . , S2Lk, fli, . . . , ap, (78) 
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where SLj+i = Si, j = 1, ... ,2k — 1, i = 1,...,L, (and in particular slj = I, j = 
1, . . . , 2k). The sequence fl78l) will be called the s-path. The s-path is a concatenation of 
2k s cycles Si, . . . ,sl, the beginning and the end of which are connected to the cluster C 
via positive probability paths b and a, respectively (recall that ap = qi E C and = 1 
by construction). Additionally, the bji, si, S2, . . . , S2Lk, Oi, • • • , ap-segment of the s-path 
( l78l) has the important property ( 1741) . i.e. every consecutive transition along this segment 
occurs with the maximal transition probability given its destination state. (However, b, 
the beginning of the s-path, need not satisfy this property.) The s-path comes very close 
to being the sequence qi„,M required by the Lemma and will be completed to gi...M in 
§5.1.18[ In fact, the idea of the Lemma and its proof is to exhibit (a cylinder subset of) 
observations such that once emitted along the s-path, these observations would trap the 
Viterbi backtracking so that the latter winds up on the s-path. That will guarantee that 
an observation corresponding to the beginning of the s-path, is, as desired, a node. 

5.1.7 The barrier 

Consider the following sequence of observations 

f I ""it ('-7r\\ 

Zq, Zi, . . . , Zm, 3^1; • • • ; Xp^_j^, Xq, Xi, . . . , X2Lki , . . . , Xp, Z-^, . . . , Z^, \ ( yj 

where Zq, Zi, z'^^ E Z, i = 1, . . . , m; x- G Xj,-, i = 1, . . . , R — 1; and 

XqE Xi, Xi+Lj ^ ^s,, j = 1, . . . , 2/c - 1, i = l,...,L; x'- e Xa^,i = I, . . . , P. 

From this point on throughout §5.1.161 we shall be proving that x^fc is a 1-node of order 
{kL + m + P), and, therefore, that (1791) is a 1-barrier of order {kL + m + P). 

Let u > 2Lk + 2m + 1 + P + i? and let xi, . . . ,Xu be any sequence of observations 
terminating in the 2Lk + 2m + 1 + P + R observation long sequence of fl79|) . 

5.1.8 a, (3, 7, rj 

Recall the definition of the scores Su{i) (1151) and the maximum partial likelihoods p(j{u) 
fl36|) . Now, we need to abbreviate some of the notation as follows. Namely, we denote by 
5i{xi) (resp. 5i{zi)) the scores corresponding to the observation xi (resp. zi). Similarly, 

we denote by pij{xi) (resp. pfj{zi)) the maximum partial likelihoods corresponding to 
the observation xi (resp. zi). Formally, for any i,j G S and appropriate r > 0, the 
abbreviated notation is as follows: 

5iixi) := Su-P-m-2kL+iii). ptj\xi) := ptj\u -P-m-2kL + l), 0<l< 2kL; (80) 
p'ijHx'i) ■= vtjiu - P -m-2kL- R + l), 1 < / < i? - 1; 
6i{zi) := 6u^2Lk~2m.~p~R+i{i)^ ptjH^i) ■= ptj\u " '^Lk - 2m - P - R + I) , < I < m; 
6i{z'i) := 6u^m.+i{'i), P^j\z'i) := p['j\u -m + l), 1 < I < m. 

(81) 
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Also, we will be frequently using the scores corresponding to zq, x'l, XLk, and X2Lk, hence 
the following further abbreviations: 

ttj := Si{zo), Pi := Si{zm), li ■■= Si{xo), r]i '■= ^ii^Lk)- 

Note that Vj ^ C, /(^o) = = fji^i) = 0, / = 1, . . . , m by construction of Z ( §5.1.3I) . 

Hence, aj = Pj = Vj ^ C, and a more general implication is that for every j E S 

13 j = maxaip'iJ'~^\zo)fj{zrn) = (^ip{j)p'i^(])j{zo)fj{zm) for some i/Bij) G C; (82) 

7i = ^ax(3iplf~^\z^)fj{xo) = Pij{j)p[^rj)]{zm)fj{xo) for some G C. (83) 
Also note the following representation of rij in terms of 7 that we will use: 

Vj = max7,pS5^"^^(xo)/j(xfcL) = l^,U)Pllij)j\^o)fj{xkL) for some z^(j) G 5. (84) 

5.1.9 Bounds on /? 

Recall ( §5.1.4p that 60 ^ C*- We show that for every j e S 

y) /^feo- (85) 

Fix J G 5* and consider aj^(j) from (l82l) . Let f 1, . . . , Wm-i be a path that realizes {zq). 
Then 

(The last inequality follows from the definition of §5.1. 3[ ) Let be a 
maximum probability path from to 60 as in (1751) . Thus, 

Wl W2 fw2 



(The last inequality again follows from the definition of Z, §5.1.3[ ) Since g > fl76|) . we 
thus obtain: 

as required. 



5.1.10 Likelihood ratio bounds 

We prove the following claims 

P^~'\xil)<p[';-'\xil), VzGS, V/ = 0,...,2fc-l; (86) 

Pij \xiL)fj{x{i+i)L) w ■ ^ c ■ ^ 1 w; n o; 1 ^o'7^ 
^ — :<1~^' Vz,jG5, JT^I, V/ = 0,...,2A;- 1; (87) 

Pii (a;«L)/i(a;(i+i)L) 

P^~'\zJf^{xo) < A^piy\zJh{xo), Vz, J G S; (88) 
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v,.a«..^ (89) 

Pij {X2kL) ^ " ^ 

If L = 1, then (186|) becomes < pn for all i E S, which is true by the assumption 
Pi = Pii made in the course of constructing the s sequence ( §5.1.41) . If L = 1, then (IHj 
becomes 



Pijfj{xi^ 



Pnfiixi+i) 

and thus, since xi+i E Xi, < I < 2k in this case, (l87l) is true by the definition of 
Xi ( §5.1.2p (and the fact that pi = Pn). Let us next prove (!86l) and (IHTI) for the case 
L > 1. Consider any / = 0, 1, . . . , 2/c — 1. Note that the definitions of the s-path (ITHI) . 

( §5.1.2p . and the fact that XiL+i G ^^s^ for 1 < z < L imply that given observations 
xli+i, ■ ■ ■ ,a^L(«+i)-i, the path Si, . . . , sl-i realizes the maximum in p[\~^\xli), i.e. 

Pn^^^ (xil) = Pi s, fs, ixiL+l)Ps^ 82- ■■ PsL-2 SL-JsL-i {X{l+l)L~l)PsL_,l- (90) 

(Indeed, 

PlsJsAxiL+l)Ps^S2 ■ ■ ■PsL-2SL-JsL-A^{l+l)L-l)PsL-^ 1 = pIJsAxIL+i)P*s2 " " ' P*sL-J^L-ii^{l+l)l 

and for 2 = 1, 2, . . . , L - 1, p*JsXxiL+i) > Phjfj{xiL+i) for any h,j G S.) Suppose j 7^ 1 
and ti, . . . , Il-i realizes p^^~^\xil)-, i.e. 

P^i~^\xiL) = PitJtAXlL+l)Pt^t2 ■ ■ ■PtL-2tL-J'tL-AX{l+l)L-l)PtL^^j- (91) 

Hence, with and ti standing for i and j, respectively (and Sq = = 1), the left-hand 
side of flHTl) becomes 



PtohfhiXlL+l) \ f Put2ft2{XlL+2) \ ( PtL-2tL-JtL-A^(l+l)L-l) \ (PtL-ltJj[X(l+l)L) 



Pso Si fsi [xiL+l]' \Psi S2fs2 {xiL+2) ' ^PsL-2 SL-1 fsL-1 {X{l+l)L~l)^ ^Psl-i slI'i 

For h = 1, . . . , L such that th 7^ Sh, 

Pth-ithfthixiL+h) 



Psh-.i Shfsh{XlL 



+h) 



< 1 - e, since xiL+h G Xs,,- (92) 



For all other h, Sh = th and therefore, the left-hand side of ( l92l) becomes = 
< 1 (by property fl7i|) ). Since the last term of the product above does satisfy fl92l) 

(j 7^ 1)) dHZ]) is thus proved. Suppose next that realizes p^f ^\xil)- With 

So = 1 and = i, similarly to the previous arguments, we have 

p[i~^\xil) ^ f Pt^-itJtiX^lL+h) \ Pt^.i 



Pi: 



1^ ^^^^Psh-iShfsh{^lL+h)^PsL-il 



32 



implying fl86|) . 

Let us now prove fl88l) . To that end, note that for all states /i, i,j & S such that pjh > 0, 
it follows from the definitions fl671) that 

PiL<pL<a. (93) 

If i? = 1, then (EHl) becomes 



Pijfjixo) < Apb,^ifi{xo). 

By the definition of (recall that xq G ^"1), we have that for every i,j & S Pijfj{xo) < 
plfi{xo). Using ([93]) with h = 1 and j = bo, we get plfi{xo) < Apboifi{xo) {pboi > by 
the construction of b §5.1.4p . Putting these all together, we obtain 

Pijfjixo) < Plfiixo) < Apboifi{xo), as required. 
Consider the case R> 1. Let ti, . . . , t/j-i be a path that realizes plj yZm), Le. 

Pif \zm) = Pitifti{x'i)ptit2ft2{x'2) ' ' ' PtR-2tR-iftR-i{x'ii_i)ptn_ij- 

By the definition of Xi ( §5.1.21) and the facts that x[. E Xb^, r = 1, 2, . . . , i? — 1, and 
Xq E Xi, we have 

p'if~^\zm)fj{xo) < Pljb^{x[)pljb2{x2) ■ ■ ■pl^^Jba.Ax'R-Mhi.Xo). (94) 

Now, by the construction of b ( §5.1.41) . Pb^-iK > for r = 1, . . . , i?, {pR = 1). Thus, the 
argument behind (p3l) applies here to bound the right-hand side of (p^ from above by 

ApbobJh{x[)Apb,b2fb2{x2) ■ ■ ■ Apbj,^^bn.JbR.A^'R-i)Apbj,^,ifi{xo) = A^ pl^;^\z,n)f i{xo), 
as required. 

Let us now prove flS^ . If m = 1 then flSUjl becomes 

p\f\x,kL)<p[f{x2kL)q~\ WjeC,WzeS. (95) 

If P = 0, then fl^^ reduces to Pij < Pijq~^ which is true, because in this case the state 
qi = qt = 1 belongs to C ( §5.1.4p and Pijq~^ > 1 ( fl75]) . fl7^ with m = 1). To see why 
is true with P > 1, note that by the same argument as used to prove (15^ and (I57j) . 
we now get 

Pi^!p^Ha^2fcL)/ap(a;p) > p''^J^\x2kL)fi{x"p), ^h,l G 5. (96) 
Also, since ap = qi e C ( ^^TTM . Papjq^^ > 1 (dZSD, dZn]) with m = 1). Thus 

(P) by glJ (p_l) „ byjp (p„i) 

Plj'{X2kL) = ^a^Pil '{X2kL)fl{Xp)pij < p\^^ '{X2kL)fap{Xp)maxpij < 
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For m > 1, let ti, • • • 5 ^m-i be a path realizing P^f^~^\x"p) . Thus, 

P'r~'\^"p) = PktJtM)Pt.tJtM) ■ ■ ■ ft,.-.Wm-i)Pt^-u < K"^-'- (97) 

(This is true since z[. ^ Z ioT r = 1,2, . . . ,m — 1 ( §5.1.31) and thus, for p^^j ^\x'p) to be 
positive it is necessary that tr & C, r = 1, . . . ,m — 1, implying ftriz'^.) < K.) Now, let 

ti,t2, . . . , Un-i realize P^^'j^\x'p), which is clearly positive, with trGC,r = l,...,m — 1 
{z[. G 2 for r = 1, 2, . . . , m — 1), and ap,j G C (recall the positivity assumption on Q™, 
gSXS]). We thus have p'a^~/\x'p) = Papt^ft^{z[)pt,tJM) ■ ■ ■ ft^^^{z'm-i)Pt.^-ij > 

> C/*i(^i)4(4)---/*.-i(4-i) > qS"'^'- (98) 

Combining the bounds of (p7|) and (!98l) (g > 0, (!76|) ). we obtain : 



Finally, 



Phj i^p) <Papj ixp)[j) Q ■ (99) 



(P+m~l), .hym {P~l)i "^ (m.-l), // by Jll, 



Pij \X2kL) = ^^^Pil {X2kL)fl{Xp)plj '{X 



< 



5.1.11 7j < const X 7i 

Combining (!83|) . (185!) . and (l88l) . we get that for every state j G S*, 
7j = A^(i)P-^(j)j(2m)/i(xo) < A^{i)Pfeoi (2;m)/i(a;o)A < 

where 

U'^q-^l^^yA^. (100) 

Hence 

7j<f/7i, VjG5. (101) 
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5.1.12 Further bounds on likelihoods 

Let / > and > be integers such that I + n < 2k but arbitrary otherwise. Expanding 
Pii^ recursively according with ( 1371) . we obtain 



ptl^^\xiL)=. .max ^^(X(i+i)L)4(X(i+2)L)--- (102) 

'tl,l2v)*n-l€J 



Since pf~^\xiL)fiA^{i+i)L) < p[\~^\xiL)fiix^i+i)L) for any ii G 5", as well as 

Pit-ili^il+r-l)L)fir{X{l+r)L) < pfl^\x{l+r~l)L)fl{X{l+r)L) r = 2, . . . ,n - 1, by 

and since p^^~^\{x{i^n-i)L) < PvC^\'^(i+n-i)L) for any i„„i G S by (ISBIl . maximization 
(11021) above is achieved as in (11031) below: 

P^11~^\xIl) =pfl^\xiL)fl{X(^i+i)L)pf^~^\x^i^i)L)fi{x^i^2)L) ' " " (103) 
■ ■ ■pfl^\x{l+n-2)L)fl{X(^i+n-l)L)pfl^\x{l+n-l)L)- 

Now, we replace state 1 by generic states i,jES on the both ends of the paths in (11021) 
and repeat the above arguments. Thus, also using (I103p . we arrive at bound (I104p below: 

l+n 

ptj^~^\xiL)fj{x(j,+n)L)< W pfi~^\x[u~l)L)fl{XuL)^^ =^ (104) 

u=l+l 



In particular, (I104p states 



P^''-'\xo)f,{xkL) < pfi-'\xo)h{xuL). Vz, J G S. (105) 

5.1.13 rjj < const x i]i 
In order to see 

Vj<Uvu Vjg5, (106) 

note that: 

& (kL-l) (kL-l) 

rjj = max'jiplj '{xo)fj{xkL) < max7iph {xo)fi{xkL) < 

by mn by m 

< UjiPii {xo)fi{xkL) < Ur]i. 
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5.1.14 A representation of rji 

Recall that k, the number of cycles in the s-path, was chosen sufficiently large for fl77|) to 
hold (in particular, A; > 1). We now prove that there exists {1, . . . ,k — 1} such that 

Vl = 5l(x«L)rff"'^'^^'^(x.L)/l(x,L). (107) 

The relation fll07p states that (given observations xi,X2, ■ ■ ■ a maximum-likelihood 
path (from time 1, observation Xi) to time u — m — P — kL (observation Xki) goes through 
state 1 at time u — m — P — 2kL + kL, that is when x^l is observed. 

To see this, suppose no such k exists to satisfy fllUTI) . Then, applying (1571) to (IMI) and 
recalling that Si^x^l) is introduced by (IHOl) . we would have 

Vl = lMi)pf^[i]],{xo)fn{xL)pfj,^\xL)fU^^ ■ ■ ■pf^Zl\{x{k-i)L)h{xkL) 

for some ji 7^ 1, . . . , jfc-i 7^ 1- Furthermore, this would imply 

by EJ, IMJ (L-l)( . , . by C3 



^1 < 7i,(i)(l-e) llMi H2;(.-l)L)/l(XiL) < 

4 = 1 



by GZJ 1^ by IM} 



7it^9 (^1 ^ 11^11 KX(i^r)L)]\{xiL) = 
^ ^ i=i 

/ r \ m k k 

by ifTOOl l / 



J7j Ylp[\~^\x^i_i)L)fl{XiL) < llYlp[\~^\x(i-i)L)fl{XiL). (108) 
^ i=l i=l 



(The last inequality follows from g < 1 (1761) and 5 < K, §5.1.3[ ) On the other hand, by 

definition flH^ (and k — 1-fold application of fl37|) ). ?7i > 7i HiLi Pi^~^'*(3^('t-i)L)/i(3^jL)5 
which evidently contradicts (11081) above. Therefore, k satisfying fll07p and 1 < k, < k, 
does exist. 

5.1.15 An implication of (fT03]) and (fTOTl) for 6i{xil) 

Clearly, the arguments of the previous section ( §5.1.141) are valid if k is replaced by any 
I G {k, . . . , 2k}. Hence the following generalization of (11071) : 

Si{xil) = h{x^(r)L)Pi[~''''^^^^~^\xn(r)L) fi{xiL) for some k{1) < I. (109) 

We apply (I109P recursively, starting with k^^^ := / and returning k^^^ := k{1) < I. If 
K^^^ < k, we stop, otherwise we substitute k^^^ for /, and obtain k^"^^ := k{1) < k^^\ and 
so, on until k'--'^ < k for some j > 0. Thus, 

Si{xil) = Si{x^u)L)Pif' "~'''''^^~^H^/.0)l)/i(2;«u-i)l) ■ ■ ■Pn~''^'^^^~^^(a;«(i)L)/i(xzL). (110) 
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Applying (11031) to the appropriate factors of the right-hand side of (11 101) above, we get: 
Si{xil) =Slix^U)L)Pll~^\x^U)L)fl{x^^U)+i)L) " " ' p'n~^\x{k-i)L) fiixki) ■ ■ ■ (111) 

■ ■ ■Pn~^\x(nW-l)L)fl{XnWL) " " ' Pu~^\x(^i^i)l) fl{xiL) ■ 

Also, according to (11031) . 

^i{x^U)L)Pn~^\x^u)L)fi{x(^u)+i)L) ■ ■ ■Pn~^\x{k-i)L) = Sl{x^U)L)Pn~''^'^^^~^\x^u)L)- 
At the same time, 

WkU)l)Pii {x^u)L)fi{XkL) < Vi- (112) 

However, we cannot have the strict inequality in (11121) above since that, via (lllip . would 
contradict maximality of Si{xil)- We have thus arrived at 

Si{xil) = VlPu~^\xkL)fl{X{k+l)L) ■ ■ ■pfl~^\x{l~l)L)fl{xiL). (113) 

In summary, for any / > k and / < 2k there exists a realization of 5i(x/l) that goes 
through state 1 every time when Xj^, i = fc, is observed. 

5.1.16 is a (/cL + m + P)-order 1-node 

When we prove in §5.1.171 that for any i G S*, i 7^ 1, and anyj G C, 

IkL+m+P-l), . IkL+m+P-l) , x 

liPij K^kL) < VlPlj [XkL), (ii4j 

this will immediately imply that Xki is a 1-node of order kL + m + P. Indeed, let / G S* 
be arbitrary. Since fj{z'^ = for every j E S\C, any maximum likelihood path to state 
I at time u + 1 (observation Xu+i) must go through a state in C at time u (observation 
Xu = z'^.) Formally, 



ViPil '{xkL) = ^&^VtPij {xkL)fj{zJPji = maxriip\. ' {xkL)fj{zJPji 

jg5 JSC 



by ITTit 



< maxr/ip^. {xkL)fj{zJpji = r/iPl; '{xkL)- 

Therefore, by Definition 13.51 a; t-r is a 1-node of order kL + m + P. 



5.1.17 Proof of (fTT4]) 

Let i G S* and j G C be arbitrary. Let state j* G S* be such that 

Pij [XkL)—Pij* [XkL)jj*[X2kL)Pj*j [X2kL) — l^[^,3 )Pj*j \X2kL) 

where 

v{T',j)'= p[^^~^\xkL)fj{x2kL), for all i,j e S. 
We consider the following two cases separately: 
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1. There exists a path reahzing p-^f ^\xkL) and going through state 1 at the time of 
observing xn for some / G {k, . . . , 2k}. 

Af-'^M = pf{-'''-'\x,,)h{x,Mf~''~'\^iL). (115) 
Equation ( lllSp above together with the fundamental recursion ( l38l) yields the following: 



ViPij* (Xkl) = ViPil {XkL)fl{xiL)Pij* (xil) < 

by (|80j, (ISHJl 



< ^ 5,{xil)p[T~'^''~'\xil). (116) 



At the same time, the right hand-side of flll6p can be expressed as follows: 

<5i(x,.)p«;.'-"'-"(x,o '^s^ rhp«i-""-"(x..)/,(x„)p«;.'-"'-'> ^'fi™ 

"'ffSI (117) 

Therefore, if there exists / G {k, . . . , 2k} such that (11151) holds, we have by virtue of 
(fTT6D and ffTTTD : 

ViP?j^~^\xkL) < r]ipfj^*~^\xkL), that is, 7]iu{i, j*) < r]iu{l, j*). (118) 

Hence, 

by l(TT8|l / , r, IN by JSTll . , , „ i\ 

S Vl^[^yJ >Pj*j [X2kL> < ?7lPlj [XkL) 

and dm]) holds. 

2. Assume now that no path exists to satisfy (IllSp . Argue as for fllOSp to get 

2k 

iy{z,f) < (1 - e)'~' n p[\~'\^in-i)L)fi{xnL). (119) 

n=k+l 

By I103[ the (partial likelihood) product in the right-hand side of (11190 equals i/(l, 1). 
Thus, 

...(^,/)pir-)(x...) ^ ,.(1 - e)-v(i, i),5.?r")(x...) ^^<^ 



< vrq' { ^ ) ^-"Ki, i)prr-^^(^2.L) < 



by |[TOO) l. l(T06)l / § 



< Viq{j^] uil,l)ppl''~'\x2kL). (120) 



Hence, for every j' G S, 

im+P-l). byGS (m+P-1). by GSS 

Vil'ihJjPj'j {X2kL) < Vil'ihJ )Pj*j {X2kL) < 



38 




(m.+P-l) 



(m+P-1) 



(a;2feL) < 



by ll89l l 



(m+P-l) 



(a;2fcL) < 



by ^ 



which, by virtue of fl37|) . imphes (11141) . 



5.1.18 Completion of the s-path to qi,„M and conclusion 



Finally, let 



M = 2m + 2Lk + P + R + 2, r = kL + P + m, 1 = 1. 



Recall from §5.1.41 that bo G C. Since all the entries of Q"^ are positive, there exists a 
path vq, fi, . . . , Vm-i, bo E C such that PviVi+i > and Pv^^ibo > 0. Similarly, there must 
exist a path ui, . . . ,Um & C such that ^^.^j >OVz = l,...,m — 1 and m > (recall 
that ap G C). Hence, by these and the constructions of §5.1.61 all of the transitions of 
the following sequence occur with positive probabilities. 

qi...M = Vo,Vi,..., Vm-1, bo,bi,..., bR,Si,..., S2Lk, ai, . . . ,ap,ui, . . . , Um- (121) 

Clearly, the actual probability of observing gi,..M is positive, as required. By the con- 
structions of § §5.1. 2115. 1.4[ the conditional probability of B below, given qi,„M, is evidently 
positive, as required. 



B = xX,,x---x X,^_^ xXixXs,x---x Xs,,,_, X X^ X Xa, x ■ ■ ■ X Xa^ x Z"\ 



Finally, since the sequence (179|l below was chosen from B arbitrarily ( §5.1.7p and has been 
shown to be an /-barrier of order r, this completes the proof of the Lemma. 



5.2 Proof of Lemma 13.21 

Proof. We use the notation of the previous proof in §5.11 We deal with the following 
two different situations: First ( §5.2.11) . all barriers from B as constructed in the proof 
of Lemma 13.21 are already separated. Obviously, there is nothing to do in this case. 
The second situation ( §5.2.21) is complementary, in which case a simple extension will 
immediately ensure separation. 

5.2.1 All G B are already separated 

Recall the definition of Z from §5.1.31 Consider the two cases in the definition separately. 
First, suppose Z = Z\{Ui^s^i), in which case Z and Xi are disjoint for every / G S. 



dcf 



^Oi ^1) • • • ) Zm—li Zjni 3^1, • • • ; Xq, Xi, . . . , X2Lki 2^1 ; • • • ; Xp, Z-^, . . . , 2;, 
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This implies that every barrier (179|) is aheady separated. Indeed, for any w, 1 < w < r, 
and for any x'' G B, the fact that 2;A/-max(m lu) ^ example, makes it impossible for 

i^i...wy ^i...M-w) ^ for any a;']^ ^ G A:''". Consider now the case when Z = Z n Xg for 
some s G C. Then 

B C AT+i X A-,, X • • ■ X X A^i X A-,, X ■ ■ ■ X xXixXa,x---x Xa^_^ X X^+\ 

Let x** G -B be arbitrary. Assume first L > 1. By construction ( §5.1.41) . the states 
Si, . . . ,sl are all distinct. We now show that (a^L..t„, 2:5...m-w) ^ ^ ^'i...io ^ 

when 1 < w < r. Note that the sequence 

(lm+2...m.+R+2kL+P+l = (^1; • • • ; ^R-li 1; ^i, . . . , S2kL-l, 1; Ol; • • • ; ^P-l; 

is such that no two consecutive states are equal. It is straightforward to verify that there 
exist indices j, < j < m — 1, such that, when shifted w positions to the right, the 
pair £ '^s would at the same time have to belong to Xq._^^^^ x Xq.^^^^ with 

m + l<j + l + w<j + 2 + w<m + R + 2kL + 1 + P. This is clearly a contradiction 
since Xqj^-^_^^ and Xq^^^^^ are disjoint for that range of indices j. A verification of the 
above fact simply amounts to verifying that the inequality max(0,m — w) < j < min(m — 
l,m + R + 2kL — 1 + P — w) is consistent for any w from the admissible range: 

i. ) When > m — w, m—1 < m + R + 2kL — 1 + P — w (m <w < m.m{r, R + 2kL + P)), 

< j < m — 1 is evidently consistent. 

ii. ) When 0>m — w,m — l> m + R + 2kL— 1+P — w (max(m, R + 2kL + P) < w < r), 

< j < m + R + 2kL — 1 + P — is also consistent since m + P + 2kL — 1 + P — r = 
P + A;L - 1 > 0. 

iii. ) When < m-w, m-1 < m+R+2kL-l+P-w {1 < w < min(m-l, P+2fcL+P)), 

m — w<j<m — 1 is consistent since w > 1. 

iv. ) When < m-w, m-1 > m+R+2kL-l+P-w (max(l, R+2kL+P-l) <w< m), 

m — w < j <m + P + 2kL — 1 + P — w is consistent since R + 2kL — 1 > 0. 

Next consider the case of L = 1 but s ^ 1 (that is, P > 0). Then 

B C A-f+i xX,,x---x X,^_^ X +1 X A",, X ■ ■ ■ X Xa^_^ x 

If s 7^ 1, then also bi ^ 1, i = 1, . . . , R — 1 and 7^ 1, z = 1, . . . , P — 1. To see that y is 
separated in this case, simply note that XM-max(w,m+i) ^ '^s for any admissible w. 

5.2.2 Barriers x'^ £ B need not be separated 

Finally, we consider the case when L = 1 and s = 1 (where s G C is such that Z = ZCiXg). 
This implies that P = 0, 1 G C, and pn > 0, which in turn implies that P = 1, and 

B C Xl''+^ X Xl^+^ X X^+^ = A'2"*+2fc+3_ 

Clearly, the barriers from B need not be, and indeed, are not separated. It is, however, 
easy to extend them to separated ones. Indeed, let go 7^ 1 be such that 1 > and 
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redefine B = Xg^^ x B. Evidently, any shift of any & B hj w {1 < w < r) positions 
to the right makes it impossible for x\ to be simultaneously in Xg^ and in Xi (since the 
latter sets are disjoint, §5.1.21) . ■ 

5.3 Proof of Proposition 14.11 

Proof. Let us additionally define the following non-overlapping block-valued processes 

m— 

and stopping times 



The process is clearly a time homogeneous, finite state Markov chain. Since Y is 
aperiodic and irreducible, so is D'^. Hence U^) is also an HMM. 
Since Y is stationary (under vr), q occurs in every interval of length M with the same 
positive probability (Lemma 13.21) . In particular, q belongs to the state space of D^. Since 
is irreducible and its state space is finite, all of its states, including q, are positive 
recurrent. Hence Et^'{Rq) < oo and E.„i{R\ — Rq) < oo and recall that R\ — R\_i, 
i = 1,2,... are i.i.d. (These and the statements below hold for any initial distribution 
it'.) The following two equalities are straightforward to verify and ultimately yield the 
second statement: Et^'^ui — uq) < Etj-'^u^ — Uq) = -^Et,i{R\ — -Rq) < oo. The second 
equality above is also a simple extension of the Wald's equation (for a general reference 
see, for example, 

It can similarly be verified that E^'V^ = 'y*E^'R^ + ^-^Et^i{R\ — /^q), which is again 
finite. Finally, E^^iVq < M{Et,>vI - 1) + 1 < oo. ■ 

5.4 Proof of Proposition 14.21 

Proof. Recall (jSSD, the definition of stopping times r, according to which, for each 
i = 1,2,... the underlying Markov chain satisfies Yt-. = I. Hence, the behavior of X 
after Tj does not depend of the behavior of X up to Tj. Together with the fact that Tj are 
renewal, this establishes regenerativity of X. Next, to every Tj there corresponds a r-order 
/-node and Tj is always Uj for some j > i. This means that all the nodes corresponding to 
Tj's are also used to define the alignment as in Definition 14. 1[ Therefore, the alignment 
up to Ti does not depend on the alignment after r^. In other words, the segment of the 
alignment process that corresponds to Tj is a function of the segment of X corresponding 
to the same Tj. Formally 



u', min{m > 1 : t/„^, eB,Dl = g}, 
u'; minim >ul,:UteB,Dl = qy, 



(122) 



i?^ ""^^ mm{m > 1 : = g}, 
Rl mm{m > RU : = q}. 



(123) 



{V,:se T,_i + 1, . . . , Tj) = Vi^i){Xs : s G Tj.i + 1, . . . , Tj). 
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Thus, the process Z is regenerative with respect to r. ■ 



5.5 Proof of Theorem 14.41 

Proof. First note that the right-hand side of (165|) does define a measure. 

The proof below uses regenerativity of Z in a standard way. For every n > tq and A ^ B, 

and for every I E S, we have 

n ^ — ' n ^ — ' n . ^ — ' n . ^ — ' 

2=1 i=l i=ro-\-l ^— Tfc(7i)+1 

where /c(n) = max{/c : < n} stands for the renewal process. Now, since Tq < oo a.s., 
we have 

1 r 
-y^jAxiiZi) < — ^ 0, a.s.. 

77, — ^ 77, 

Let /i = Et^. By ([6l, /i < oo. Then 

n-T^^Tj^^^^ a.s. 
n n 

Finally, since Z is regenerative with respect to tq, ti, . . ., we have 



n ■'^ ' 77, /c(?7,) 

i=ro + l ^ ^ fc=l 

where 

Cfc = ^AxliZi), /c=l,2, ... 

i=rfc„i+l 

are i.i.d. random variables. Let mi{A) stand for EC,k- Thus, mi{A) < /i < oo. Then, as 
n oo, we have 

^ fc(n.) 

/X and ——'^^k^miA), a.s. 



77 



/c(?7) ^ 
Let us calculate m;(yl). Clearly, 



mi{A) = Ej2lAxi{Zi: 

i=l 



Now 



mi{A) =eJ2Iaxi{ZI) = J]E(J]/^,,(^nk; = J)PK = 

i=l j=l i=l 
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OO J 



= E E eAxl\T^= j)P(ro^ = j) 
j=i i=i 

CO OO 

= J2 ^Axl\Tl= j)PK = j) + E G ^ X /|ro^ = j)PK = j) + 

= P(Z[ e A X /, To" > 1) + P(Z^' G A X /, > 2) + ■ • • 

OO OO 

= ^P(Z[ G A X /,ro^ > z) < Y.^{tI > z) = /i < OO 

j=l i=l 

Similarly, 



1 

n < ^ II 



where wi =^ Xli^i = ^ > 0- Hence, we have shown that for each / e S and for 

every A E B 

Recalling that A" is a separable metric space and envoking the theory of weak convergence 
of measures now establishes ^ Qu a-S-- 

It remains to show that for all / G S* and A E B 

pn(A)-.!^, a.s.. (126) 



To see this, consider XliLi ^Axii^i, V/)- Since V/ = Vi, if i < Tk(n), "we obtain 



-V/^xz(X„y/) = -V/^x/(^^) + - V /Ax/(^i) + - V /ax/(X„1^/) 

Similarly, 



m;(A) 

a.s. 



n — ' n — ' n ^ — ' n ^ — ' u 



-E^'(^/)^-E^i(^^ = ^'^o ^0 = -, a.s.. (127) 



These convergences prove ( 1126^ . 
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